geoarrow / geoarrow-python

Python implementation of the GeoArrow specification
http://geoarrow.org/geoarrow-python/
Apache License 2.0
59 stars 3 forks source link

to_geopandas method returns an error #18

Open nagyrobir opened 11 months ago

nagyrobir commented 11 months ago

Hi!

This is sort of a continuation of #16. When i try to convert the original dataset with the "to_geopandas" method i get the error bellow. Is there anything i am doing improperly?

import geopandas as gpd import pyarrow.parquet as pa from pyarrow.parquet import read_table import shapely import geoarrow.pyarrow as ga

tb = read_table(r"/home/parquet/buildings.parquet") dataset = ga.dataset(tb,geometry_columns=["geometry"])

gp=ga.to_geopandas(dataset.to_table())

gp=ga.to_geopandas(dataset.to_table()) Traceback (most recent call last): File "", line 1, in File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 592, in to_geopandas wkb_array_or_chunked = as_wkb(obj) File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 267, in as_wkb return as_geoarrow(obj, _type.wkb()) File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 280, in as_geoarrow obj = obj_as_array_or_chunked(obj) File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 30, in obj_as_array_or_chunked return array(obj_in, validate=False) File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_array.py", line 152, in array arr = pa.array(obj, *args, **kwargs) File "pyarrow/array.pxi", line 327, in pyarrow.lib.array File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Could not convert <pyarrow.lib.ChunkedArray object at 0x7f021c19b740> [ [ 1, 2, 3, 4, 5, ... 65532, 65533, 65534, 65535, 65536 ], [ 65537, 65538, 65539, 65540, 65541, ... 131068, 131069, 131070, 131071, 131072 ], ..., [ 196609, 196610, 196611, 196612, 196613, ... 262140, 262141, 262142, 262143, 262144 ], [ 262145, 262146, 262147, 262148, 262149, ... 318059, 318060, 318061, 318062, 318063 ] ] with type pyarrow.lib.ChunkedArray: did not recognize Python value type when inferring an Arrow data type

paleolimbot commented 11 months ago

I think this one is because to_geopandas() doesn't currently handle a pyarrow.Table. It perhaps should (or at least should do a better job erroring), but in the meantime I think you can do:

# ...
df = dataset.to_table().to_pandas()
df.geometry = df.geometry.geoarrow.to_geopandas()
# ...I forget exactly how to get a pandas.DataFrame into a geopandas.GeoDataFrame
nagyrobir commented 11 months ago

I saw that the type( dataset.to_table()) returns a pyarrow.lib.Table.I guess getting it into geopandas from pandas works well then so i can stick to that, i thought i was doing something wrong. I think you guys are doing a wonderful job! and thank you again. I'll hang around to see what new things you push to the repo!

jorisvandenbossche commented 11 months ago

(re-opening, because we should make this work)