geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
257 stars 21 forks source link

read_dataframe with POINT EMPTY interprets geometry as None #436

Open mwtoews opened 6 days ago

mwtoews commented 6 days ago

While preparing #435 it seems that write_dataframe() is happy to write a dataframe with (e.g.) a "POINT EMPTY" geometry. However, read_dataframe() will read this geometry as None, so the two geodataframes don't round-trip. E.g.:

from pyogrio.geopandas import read_dataframe, write_dataframe
import geopandas as gp
from geopandas.array import from_wkt

expected = gp.GeoDataFrame({"x": [0]}, geometry=from_wkt(["POINT EMPTY"]), crs=4326)
print(expected)
#    x     geometry
# 0  0  POINT EMPTY

filename = "/tmp/test.shp"
write_dataframe(expected, filename)
df = read_dataframe(filename)
print(df)
#    x geometry
# 0  0     None

Note this is the same as fiona, e.g.:

gp.read_file(filename, engine="fiona")

returns the same. And raw fiona doesn't do much better except identify the geometry type in the schema:

import fiona
with fiona.open(filename) as ds:
    print(ds.meta["schema"])
    print([(idx, feat.geometry) for idx, feat in ds.items()])
# {'properties': {'x': 'int:18'}, 'geometry': 'Point'}
# [(0, None)]
theroggy commented 4 days ago

Shapefile cannot make the distinction between NULL/None values versus POINT EMPTY, and the choice was made to return NULL/None when reading.

E.g. Geopackage does support the distinction between both, so there you will get the proper round-tripping...

import tempfile
from pyogrio.geopandas import read_dataframe, write_dataframe
import geopandas as gp
import shapely

for geom in [shapely.from_wkt("POINT EMPTY"), None]:
    for suffix in [".shp", ".gpkg"]:
        gdf = gp.GeoDataFrame({"x": [0]}, geometry=[geom], crs=4326)

        filename = f"{tempfile.gettempdir()}/test{suffix}"
        write_dataframe(gdf, filename)
        df = read_dataframe(filename)
        print(f"{suffix=}, {geom=}:  {df.geometry.iloc[0]}")
        # suffix='.shp', geom=<POINT EMPTY>:  None
        # suffix='.gpkg', geom=<POINT EMPTY>:  POINT EMPTY
        # suffix='.shp', geom=None:  None
        # suffix='.gpkg', geom=None:  None