OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.77k stars 2.5k forks source link

Potential inconsistencies with Arrow/Parquet driver output #5670

Closed paleolimbot closed 2 years ago

paleolimbot commented 2 years ago

@rouault thank you for opening https://github.com/paleolimbot/geoarrow/issues/5 - I did a rewrite of my Arrow generation and am just getting to checking my files against those from the GDAL driver! Most of the differences I see so far are my fault (see https://github.com/paleolimbot/geoarrow/pull/10) or are not finalized in the specs yet (e.g., Arrow driver metadata is still outputting version 0.1 specs, "crs": null is transformed to a missing "crs" key).

With "edges": "spherical" I expected an error, a warning, or for the metadata to be passed through with ogr2ogr. I don't see any of those with:

ogr2ogr test.parquet /vsicurl/https://github.com/paleolimbot/geoarrow/raw/algin-metadata-with-gdal/inst/example_parquet/nc_spherical-wkb.parquet
R -e 'arrow::read_parquet("test.parquet", as_data_frame = FALSE)$metadata'

With GeometryCollection with an M attribute, the bbox seems to be missing an M (I know, M support isn't official, but all the other types seem to have an M bit in the bounding box).

OGR_PARQUET_ALLOW_ALL_DIMS=YES ogr2ogr test.parquet /vsicurl/https://github.com/paleolimbot/geoarrow/raw/algin-metadata-with-gdal/inst/example_parquet/geometrycollection_m-default.parquet
R -e 'arrow::read_parquet("test.parquet", as_data_frame = FALSE)$metadata'

(running GDAL on the osgeo/gdal:latest docker image)

rouault commented 2 years ago

With "edges": "spherical" I expected an error, a warning, or for the metadata to be passed through with ogr2ogr

OGR data model has no specific support for edges:spherical. It is just ignored. Yes perhaps exposing it as layer metadata is the best thing to do

With GeometryCollection with an M attribute, the bbox seems to be missing an M

the bbox in the json metadata ? It is not clear to me that the M dimension should be in it. It doesn't incorporate the Z currently as well.

paleolimbot commented 2 years ago

the bbox in the json metadata ? It is not clear to me that the M dimension should be in it. It doesn't incorporate the Z currently as well.

That's what I thought at first too (it didn't seem like Z or M bounds would be particularly useful). I asked @jorisvandenbossche and it seemed like the intention was to add that in...I'll open an issue in the geoparquet repo asking for clarification.

The only other thing I see is that it looks like POINT EMPTY (including Z, M, and ZM variants) are getting translated to all zeros when reading the WKT versions of feather and IPC and writing GEOARROW:

OGR_ARROW_ALLOW_ALL_DIMS=YES ogr2ogr test.feather /vsicurl/https://github.com/paleolimbot/geoarrow/raw/algin-metadata-with-gdal/inst/example_feather/point-wkt.feather
R -e 'arrow::read_feather("test.feather")$geometry'
<fixed_size_list<double, 2>[3]>
[[1]]
[1] 30 10

[[2]]
[1] 0 0

[[3]]
NULL

(This isn't the case for the WKB or geoarrow-encoded files, which read to all NaNs, which is what I expected).

rouault commented 2 years ago

Points adressed per https://github.com/OSGeo/gdal/pull/5673

paleolimbot commented 2 years ago

Awesome! Thank you! I dropped writing ZM bounds until that's in a spec somewhere. I've been slow to get a GDAL development build workflow going but will do it soon to make sure I'm keeping up with the latest (particularly with the geoarrow encoding).