geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
269 stars 22 forks source link

BUG: MacOS Arm64 Pyogrio + GeoPandas feather I/O issues via pip #144

Closed brendan-ward closed 1 year ago

brendan-ward commented 2 years ago

(M1 MacOS 12.4, Python 3.10, GEOS 3.11, GDAL 3.5.1 )

We don't currently have Arm64 (M1) wheels, so this requires building pyogrio from source.

Using a virtual environment created using python -m venv <name> or poetry and building pyogrio from source creates what appears to be a working package, but it fails when used directly after reading from feather. (shapely, pygeos, fiona all built from source using local GDAL / GEOS)

For example, when building within the root of this repo on Arm64 in that environment:

python setup.py build_ext --inplace
pip install pytest pandas pyproj
pip install shapely --no-binary shapely
pip install pygeos --no-binary pygeos
pip install --no-deps geopandas

then run pytest pyogrio/tests, all tests pass.

Using pyogrio or pyogrio via GeoPandas works fine:

from geopandas import read_file
from pyogrio import read_dataframe, write_dataframe

df = read_dataframe(
    "pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp"
)
write_dataframe(df, "/tmp/test.shp")

df = read_file(
    "pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp",
    engine="pyogrio",
)
df.to_file("/tmp/test.shp", driver="ESRI Shapefile", engine="pyogrio")

However, if the data frame is first read from feather, we seem to run into NULL pointer / driver issues.

from geopandas import read_feather
from pyogrio import write_dataframe

df = read_feather("test.feather")
write_dataframe(df, "/tmp/test.shp")

It crashes while attempting to create the output data source using the driver:

Traceback (most recent call last):
  File "pyogrio/_io.pyx", line 1061, in pyogrio._io.ogr_create
    ogr_driver = exc_wrap_pointer(GDALGetDriverByName(driver_c))
  File "pyogrio/_err.pyx", line 180, in pyogrio._err.exc_wrap_pointer
    raise NullPointerError(-1, -1, "NULL pointer error")
pyogrio._err.NullPointerError: NULL pointer error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bcward/projects/pyogrio/.scratch/test_install.py", line 22, in <module>
    write_dataframe(df, "/tmp/test.shp")
  File "/Users/bcward/projects/pyogrio/pyogrio/geopandas.py", line 312, in write_dataframe
    write(
  File "/Users/bcward/projects/pyogrio/pyogrio/raw.py", line 193, in write
    ogr_write(
  File "pyogrio/_io.pyx", line 1227, in pyogrio._io.ogr_write
    ogr_dataset = ogr_create(path_c, driver_c)
  File "pyogrio/_io.pyx", line 1064, in pyogrio._io.ogr_create
    raise DataSourceError(f"Could not obtain driver: {driver_c.decode('utf-8')} (check that it was installed correctly into GDAL)")
pyogrio.errors.DataSourceError: Could not obtain driver: ESRI Shapefile (check that it was installed correctly into GDAL)

To make this extra confusing, if we add import fiona at the top, everything works. Uninstalling fiona does not make it work though. (nor does uninstalling pygeos or building pyproj from source)

brendan-ward commented 2 years ago

This is sensitive to order of operations. Regardless of import order, if we first read a file using pyogrio before reading a feather file using GeoPandas, everything works properly. If we first read the feather file, we fail when writing the output file using pyogrio. Likewise, if we call list_drivers before reading from feather, everything works properly.

My hunch is that we need to have something that calls GDALAllRegister() in Cython as part of the initial import of pyogrio.

GDAL is from homebrew, which does not yet have the GeoParquet driver, so that doesn't seem to be the source of the conflict. Not sure why reading from feather or if something in pyarrow is causing the GDAL drivers to get messed up.

brendan-ward commented 2 years ago

This issue is also present in pyogrio 0.40 from conda. Not sure how we didn't see this before, reading from feather and writing to GIS files is a very common workflow for me.

jorisvandenbossche commented 2 years ago

Some time ago I diagnosed an issue with fiona + arrow combination (https://github.com/conda-forge/gdal-feedstock/issues/592) where importing fiona messed up some symbols for pyarrow. But so that was the other way around, and should also be solved by now.

jldorscheidt commented 2 years ago

Hi, commenting on this thread becaue we are encountering the same issues with write_dateframe functionality. The issue arises on ubuntu x86_64 architecture, in a venv with pyogrio and geopandas installed.

Environment python=3.8

pip install pyogrio
pip install geopandas

Minimal example

from pyogrio import write_dataframe
import geopandas
from shapely.geometry import Point

write_dataframe(geopandas.GeoDataFrame({'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}),
                '/tmp/write_w_pyogrio.gpkg')

This raises the following error:

Traceback (most recent call last):
  File "pyogrio/_io.pyx", line 1061, in pyogrio._io.ogr_create
  File "pyogrio/_err.pyx", line 180, in pyogrio._err.exc_wrap_pointer
pyogrio._err.NullPointerError: NULL pointer error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/joost/.config/JetBrains/PyCharmCE2022.2/scratches/scratch_78.py", line 7, in <module>
    write_dataframe(gdf, '/tmp/write_w_pyogrio.gpkg')
  File "/home/joost/PycharmProjects/bug_report/lib/python3.8/site-packages/pyogrio/geopandas.py", line 312, in write_dataframe
    write(
  File "/home/joost/PycharmProjects/bug_report/lib/python3.8/site-packages/pyogrio/raw.py", line 193, in write
    ogr_write(
  File "pyogrio/_io.pyx", line 1227, in pyogrio._io.ogr_write
  File "pyogrio/_io.pyx", line 1064, in pyogrio._io.ogr_create
pyogrio.errors.DataSourceError: Could not obtain driver: GPKG (check that it was installed correctly into GDAL)

Force installing fiona or geopandas with --no-binary does not solve the problem.

However, overwriting an existing geometry file, or calling pyogrio.list_drivers() before writing does not raise an error:

from pyogrio import write_dataframe
import pyogrio
import geopandas
from shapely.geometry import Point

pyogrio.list_drivers()
write_dataframe(geopandas.GeoDataFrame({'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}),
                '/tmp/write_w_pyogrio.gpkg')
martinfleis commented 2 years ago

@jldorscheidt I believe that this will be resolved by #145.

brendan-ward commented 2 years ago

@jldorscheidt are you able to build from source using the latest changes in main? Would be good to know if that cleared up the issue on x86_64 architectures as well.

We have a few more things that I'd like to see merged before we cut our next release; apologies for the delay there.

jldorscheidt commented 2 years ago

@brendan-ward, I just checked writing to gpkg after building from source in main, and the error does not appear anymore. Looking forward to the next release!

brendan-ward commented 1 year ago

Resolved by #145