geoarrow / geoarrow-python

Python implementation of the GeoArrow specification
http://geoarrow.org/geoarrow-python/
Apache License 2.0
59 stars 3 forks source link

fix(geoarrow-pyarrow): Better geometry_names, geography column support, and default geometry column name #35

Closed paleolimbot closed 10 months ago

paleolimbot commented 10 months ago

As a follow-up to #34!

geography column support for "compatible" Parquet files:

from geoarrow.pyarrow import io
import pyarrow as pa
from pyarrow import parquet

table = pa.table([pa.array(["LINESTIRNG (0 0, 1 1)"])], ["geography"])
parquet.write_table(table, "test.parquet")
table_out = io.read_geoparquet_table("test.parquet")
table_out["geography"].type
#> WktType(spherical geoarrow.wkt <PROJJSON:{"$schema": "[https://proj...>](https://proj...%3E/))

geography_names behaviour:

import geoarrow.pyarrow as ga
from geoarrow.pyarrow import io
import pyarrow as pa
from pyarrow import parquet

table = pa.table([ga.as_geoarrow(["LINESTRING (0 0, 1 1)"])], ["geometry"])

# Default: write geometry_types if no computation is required to do so
io.write_geoparquet_table(table, "test.parquet")
parquet.read_schema("test.parquet").metadata[b"geo"]
#> b'{"version": "1.0.0", "primary_column": "geometry", "columns": {"geometry": {"encoding": "WKB", "geometry_types": ["LineString"], "crs": null}}}'

# ...force omitting with write_geometry_types=False
io.write_geoparquet_table(table, "test.parquet", write_geometry_types=False)
parquet.read_schema("test.parquet").metadata[b"geo"]
#> b'{"version": "1.0.0", "primary_column": "geometry", "columns": {"geometry": {"encoding": "WKB", "geometry_types": [], "crs": null}}}'

# ...force including with write_geometry_types=True, even when this involves an O(n) calculation
table = pa.table([ga.as_wkb(["LINESTRING (0 0, 1 1)"])], ["geometry"])
io.write_geoparquet_table(table, "test.parquet", write_geometry_types=True)
parquet.read_schema("test.parquet").metadata[b"geo"]
#> b'{"version": "1.0.0", "primary_column": "geometry", "columns": {"geometry": {"encoding": "WKB", "geometry_types": ["LineString"], "crs": null}}}'

Default geometry column name when GDAL says the geometry column name is "":

from geoarrow.pyarrow import io

io.read_pyogrio_table("https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.fgb.zip")
pyarrow.Table
OBJECTID: int64
FEAT_CODE: string
BASIN_NAME: string
RIVER: string
HID: string
geometry: extension<geoarrow.wkb<WkbType>>
----
OBJECTID: [[1,2,3,4,5,...,42,43,44,45,46]]
FEAT_CODE: [["WABA30","WABA30","WABA30","WABA30","WABA30",...,"WABA30","WABA30","WABA30","WABA30","WABA30"]]
BASIN_NAME: [["01EB000","01EC000","01EA000","01DA000","01ED000",...,"01FE000","01FB000","01FC000","01FD000","01EQ000"]]
RIVER: [["BARRINGTON/CLYDE","ROSEWAY/SABLE/JORDAN","TUSKET RIVER","METEGHAN","MERSEY",...,"INDIAN","MARGAREE","CHETICAMP RIVER","WRECK COVE","NEW HBR/SALMON"]]
HID: [["919201D6D5094930ABF2D49BCEA27FC9","5293753C835142939326618A9513D35E","A7592A93F7A44022BCEC9D958BF46415","47EF929A586E4B429F51DC7A72BFEFE8","425CA3DB74F449E6AD6FC1E83130C813",...,"6A07E160F7C0425893FED13BAD9C742B","B4104D50CA2942C084E38F448FC4F059","133E00BAF7594C4FB5A23A4B30A6E8F1","B0B72EECDB734AB5BB20A668667AC5D5","7CD66433BC7C45A69E9506453C213ED9"]]
geometry: [[...]]
codecov[bot] commented 10 months ago

Codecov Report

Merging #35 (471490c) into main (b26c674) will decrease coverage by 0.02%. The diff coverage is 97.14%.

@@            Coverage Diff             @@
##             main      #35      +/-   ##
==========================================
- Coverage   95.07%   95.06%   -0.02%     
==========================================
  Files          10       10              
  Lines        1401     1418      +17     
==========================================
+ Hits         1332     1348      +16     
- Misses         69       70       +1     
Files Coverage Δ
geoarrow-pyarrow/src/geoarrow/pyarrow/io.py 98.79% <97.14%> (-0.54%) :arrow_down:
jorisvandenbossche commented 9 months ago

Nice, thanks!