arctern-io / arctern

Apache License 2.0
103 stars 49 forks source link

GeoSeries to_sql not support #654

Open czpmango opened 4 years ago

czpmango commented 4 years ago

Describe the bug GeoSeries did not behave as expected when call to_sql.

To Reproduce My test code:

import pandas as pd
from arctern import GeoSeries
import arctern

def trans2wkb4series(s,index=range(0,0)):
    if isinstance(index,range):
        index = range(0,s.size)
    import pygeos
    s_arr = []
    if not isinstance(s, pd.Series):
        return None
    try:
        len = s.size
        for i in range(0, len):
            if not s[i]:
                s_arr.append(None)
            else:
                s_arr.append(pygeos.to_wkb(pygeos.Geometry(s[i])))
        s = pd.Series(s_arr,index=index)
    except:
        return None
    return s

geo_s = GeoSeries(["POINT (9 0)","POLYGON ((1 1,1 2,2 2,1 1))"])
pd_s = pd.Series(["POINT (9 0)","POLYGON ((1 1,1 2,2 2,1 1))"])
pd_s_wkb = trans2wkb4series(pd_s)
pd.testing.assert_series_equal(geo_s.astype(object),pd_s_wkb.astype(object),check_dtype=False) # (as expected)

geo_df=pd.DataFrame({'val':geo_s})
pd_df=pd.DataFrame({'val':pd_s_wkb})
pd.testing.assert_frame_equal(geo_df,pd_df,check_dtype=False)

from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
# geo_res1 = geo_df.to_sql('users', con=engine, if_exists='replace', index_label='id')  # can not work
pd_res1 = pd_df.to_sql('users', con=engine, if_exists='replace', index_label='id')
engine.execute("SELECT * FROM users").fetchall() # as expected

Expected behavior Expect the same results as the panda series when call to_sql.(pandas API : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.htmll)

shengjh commented 4 years ago

Reproduce Pandas Version: v1.0.3 Here is a simplified code to reproduce it.

import pandas as pd
from arctern import GeoSeries
geo_s = GeoSeries(["POINT (9 0)","POLYGON ((1 1,1 2,2 2,1 1))"])
geo_df=pd.DataFrame({'val':geo_s})
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
geo_res1 = geo_df.to_sql('users', con=engine, if_exists='replace', index_label='id')

And it will crash at File "pandas/_libs/lib.pyx", line 1127, in pandas._libs.lib._try_infer_map AttributeError: 'GeoDtype' object has no attribute 'base'

I check out pandas source code. https://github.com/pandas-dev/pandas/blob/3adf3340453d6704d4a2cb47058214cc697a7d29/pandas/_libs/lib.pyx#L1120-L1130 The reason is pandas tries to infer data type for our GeoDtype, by getattr('name' or 'kind' or 'base'). But class GeoDtype does not have attribute 'base', then it crashed with AttributeError.