geoarrow / geoarrow-rs

GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
http://geoarrow.org/geoarrow-rs/
Apache License 2.0
243 stars 14 forks source link

GeoTable.from_arrow doesn't recognize geometry column from PointArray #589

Open deanm0000 opened 5 months ago

deanm0000 commented 5 months ago

I started from something like

my_point_array = PointArray.from_xy(
        pa.array([-160.49, -87.35,-88.01], pa.float64()), 
        pa.array([55.34, 33.46,31.01], pa.float64()), )

arrow_table = pa.Table.from_arrays([
    pa.array([1,2,3], pa.int32()),
    my_point_array
    ], names=['a','geometry'])

then tried

GeoTable.from_arrow(arrow_table)

but got

PanicException: no geometry column in table

I also tried a few things around ChunkedPointArray.from_arrow_arrays([my_point_array]) but none of it worked.

kylebarron commented 5 months ago

Thanks for trying it out!

I agree having better geometry constructors will be necessary for usability. GeoArrow defines extension metadata that needs to be on an array to declare it a geometry. Your issue is that when you call pa.Table.from_arrays, the field for each array is inferred from the data type of the arrays. But the inferred field won't have any metadata applied to it.

One way to fix this is to do use the schema parameter of from_arrays to ensure there's geoarrow metadata on the geometry column.

The other way is to register the pyarrow extension types provided in geoarrow-pyarrow. In that case, I believe the extension metadata will be automatically inferred.

For now, I've put more effort into the IO readers and writers and into the GeoPandas and Shapely interoperability. So a simple way to get a GeoTable is to first create a geopandas.GeoDataFrame and then use geoarrow.rust.core.from_geopandas.

deanm0000 commented 5 months ago

first create a geopandas.GeoDataFrame

I'm trying to quit doing that ;)

The other way is to register the pyarrow extension

That's what I really needed.

import geoarrow.pyarrow as ga
ga.register_extension_types()

Now, with my df coming from polars, I can just do

df_geo = GeoTable.from_arrow(
    df.to_arrow().add_column(
        0, "geometry", [PointArray.from_xy(df["x"].to_arrow(), df["y"].to_arrow())]
    )
)

I see that it says my geometry is a Struct but I thought it'd be a FixedSizeList. Is that always the case or is that related to how I constructed it?

kylebarron commented 5 months ago

I'm trying to quit doing that ;)

Yes of course, but baby steps!

I see that it says my geometry is a Struct but I thought it'd be a FixedSizeList. Is that always the case or is that related to how I constructed it?

GeoArrow allows either FixedSizeList or Struct for coordinate buffers. PointArray.from_xy always creates a StructArray because that's how the memory is passed in.

https://github.com/geoarrow/geoarrow-rs/pull/578 will allow you more control over interleaved vs separated layout when constructing arrays from raw buffers. We should also add a helper to go back and forth between them when you already have your arrays.