geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
258 stars 22 forks source link

Read with GeoArrow metadata for GDAL >= 3.8 #366

Closed kylebarron closed 4 months ago

kylebarron commented 4 months ago

Resolves half of https://github.com/geopandas/pyogrio/issues/345. @jorisvandenbossche mentioned in https://github.com/geopandas/pyogrio/issues/345#issuecomment-1909638656 that he'd suggest making the change to geoarrow metadata. I'd also propose that we make this the default.

A test is added to verify that the metadata on the geometry field is set. The name should be geoarrow.wkb and the extension metadata should include a CRS field with PROJJSON, which I assert is EPSG:4326 for this test dataset.

@brendan-ward would you still prefer having a flag for the user to set this? Ref https://github.com/geopandas/pyogrio/issues/345#issuecomment-1908878875

kylebarron commented 4 months ago

This will be exciting to get working with lonboard, a visualization library I'm working on. I added support for parsing from geoarrow.wkb input and with the CRS information it'll do multithreaded reprojection to WGS84, parallelizing over each Arrow chunk (conveniently GDAL will handle the chunking into row groups). So it should be really fast to plot data from a file (though today it still goes through shapely for the WKB -> GeoArrow parsing, which is a relative bottleneck, though still certainly fast enough)

brendan-ward commented 4 months ago

Could you please add an entry to the changelog?