geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
264 stars 22 forks source link

pyogrio.errors.DataSourceError: No driver registered #448

Open weiji14 opened 1 month ago

weiji14 commented 1 month ago

We've been hitting into an issue on pyogrio not being able to detect GDAL drivers in our Sphinx docs build CI since two weeks ago (https://github.com/GenericMappingTools/pygmt/issues/3301), with errors like:

We're using geopandas=1.0.1 (pyhd8ed1ab_0) and pyogrio=0.9.0 (py312h8ad7a51_0) installed from conda-forge, which includes the OGR_GMT driver (installed in GDAL=3.9.0 (py312h86af8fa_5)). This issue has been very hard to reproduce, because things work when we test things directly, but fail when running the scripts with sphinx-build. E.g. this code example:

import geopandas as gpd
import pyogrio

pyogrio.list_drivers()
gdf = gpd.read_file(
    "https://www2.census.gov/geo/tiger/TIGER2015/PRISECROADS/tl_2015_15_prisecroads.zip",
    engine="pyogrio",
)

would consistently produce a pyogrio.errors.DataSourceError: No driver registered error when ran as part of sphinx-build:

{}
ERROR 4: No driver registered./lines... [ 85%] roads.py

Traceback (most recent call last):
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 975, in execute_code_block
    is_last_expr, mem_max = _exec_and_get_memory(
                            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 807, in _exec_and_get_memory
    mem_body, _ = call_memory(
                  ^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 1594, in _sg_call_memory_noop
    return 0.0, func()
                ^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 728, in __call__
    exec(self.code, self.fake_main.__dict__)
  File "/home/user/Documents/github/pygmt/examples/gallery/lines/roads.py", line 22, in <module>
    gdf = gpd.read_file(
          ^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/geopandas/io/file.py", line 294, in _read_file
    return _read_file_pyogrio(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/geopandas/io/file.py", line 547, in _read_file_pyogrio
    return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/pyogrio/geopandas.py", line 261, in read_dataframe
    result = read_func(
             ^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/pyogrio/raw.py", line 196, in read
    return ogr_read(
           ^^^^^^^^^
  File "pyogrio/_io.pyx", line 1239, in pyogrio._io.ogr_read
  File "pyogrio/_io.pyx", line 219, in pyogrio._io.ogr_open
pyogrio.errors.DataSourceError: No driver registered.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/cmd/build.py", line 332, in build_main
    app = Sphinx(args.sourcedir, args.confdir, args.outputdir,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/application.py", line 268, in __init__
    self._init_builder()
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/application.py", line 339, in _init_builder
    self.events.emit('builder-inited')
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/events.py", line 97, in emit
    results.append(listener.handler(self.app, *args))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_gallery.py", line 616, in generate_gallery_rst
    ) = generate_dir_rst(src_dir, target_dir, gallery_conf, seen_backrefs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 539, in generate_dir_rst
    intro, title, (t, mem) = generate_file_rst(
                             ^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 1211, in generate_file_rst
    output_blocks, time_elapsed = execute_script(
                                  ^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 1116, in execute_script
    execute_code_block(
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 988, in execute_code_block
    except_rst = handle_exception(
                 ^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx_gallery/gen_rst.py", line 664, in handle_exception
    func(  # needs leading newline to get away from iterator
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/util/logging.py", line 184, in warning
    return super().warning(
           ^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1930, in warning
    self.log(WARNING, msg, *args, **kwargs)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/util/logging.py", line 131, in log
    super().log(level, msg, *args, **kwargs)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1962, in log
    self.logger.log(level, msg, *args, **kwargs)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1609, in log
    self._log(level, msg, args, **kwargs)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1684, in _log
    self.handle(record)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1700, in handle
    self.callHandlers(record)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1762, in callHandlers
    hdlr.handle(record)
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 1022, in handle
    rv = self.filter(record)
         ^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/logging/__init__.py", line 858, in filter
    result = f.filter(record)
             ^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/pygmt/lib/python3.12/site-packages/sphinx/util/logging.py", line 478, in filter
    raise exc
sphinx.errors.SphinxWarning: 
../examples/gallery/lines/roads.py unexpectedly failed to execute correctly:

or produce the expected output when ran directly in a Python script or Jupyter notebook:

``` {'FITS': 'rw', 'PCIDSK': 'rw', 'netCDF': 'rw', 'PDS4': 'rw', 'VICAR': 'rw', 'JP2OpenJPEG': 'r', 'PDF': 'rw', 'MBTiles': 'rw', 'TileDB': 'rw', 'BAG': 'rw', 'EEDA': 'r', 'OGCAPI': 'r', 'ESRI Shapefile': 'rw', 'MapInfo File': 'rw', 'UK .NTF': 'r', 'LVBAG': 'r', 'OGR_SDTS': 'r', 'S57': 'rw', 'DGN': 'rw', 'OGR_VRT': 'r', 'Memory': 'rw', 'CSV': 'rw', 'NAS': 'r', 'GML': 'rw', 'GPX': 'rw', 'LIBKML': 'rw', 'KML': 'rw', 'GeoJSON': 'rw', 'GeoJSONSeq': 'rw', 'ESRIJSON': 'r', 'TopoJSON': 'r', 'Interlis 1': 'rw', 'Interlis 2': 'rw', 'OGR_GMT': 'rw', 'GPKG': 'rw', 'SQLite': 'rw', 'WAsP': 'rw', 'PostgreSQL': 'rw', 'OpenFileGDB': 'rw', 'DXF': 'rw', 'CAD': 'r', 'FlatGeobuf': 'rw', 'Geoconcept': 'rw', 'GeoRSS': 'rw', 'VFK': 'r', 'PGDUMP': 'rw', 'OSM': 'r', 'GPSBabel': 'rw', 'OGR_PDS': 'r', 'WFS': 'r', 'OAPIF': 'r', 'EDIGEO': 'r', 'SVG': 'r', 'Idrisi': 'r', 'XLS': 'r', 'ODS': 'rw', 'XLSX': 'rw', 'Elasticsearch': 'rw', 'Carto': 'rw', 'AmigoCloud': 'rw', 'SXF': 'r', 'Selafin': 'rw', 'JML': 'rw', 'PLSCENES': 'r', 'CSW': 'r', 'VDV': 'rw', 'GMLAS': 'r', 'MVT': 'rw', 'NGW': 'rw', 'MapML': 'rw', 'GTFS': 'r', 'PMTiles': 'rw', 'JSONFG': 'rw', 'MiraMonVector': 'rw', 'TIGER': 'r', 'AVCBin': 'r', 'AVCE00': 'r', 'HTTP': 'r'} LINEARID FULLNAME RTTYP MTFCC \ 0 1104258643968 Puainako Exd M S1200 1 1103933153286 Puanako Exd M S1200 2 1103890709860 Puainako Exd M S1200 3 1104486222576 Keaau Byp M S1200 4 1104486197669 Keaau Byp M S1200 geometry 0 LINESTRING (-155.11039 19.69256, -155.1107 19.... 1 LINESTRING (-155.14804 19.68121, -155.14938 19... 2 LINESTRING (-155.1566 19.68084, -155.15641 19.... 3 LINESTRING (-155.0313 19.62267, -155.03204 19.... 4 LINESTRING (-155.02989 19.61407, -155.02981 19... ```

My guess is that the GDAL drivers are not being registered properly somehow. This was supposedly fixed in https://github.com/geopandas/pyogrio/pull/145 (see also https://github.com/geopandas/pyogrio/issues/144), but there might be certain cases where the loading doesn't happen correctly? We have a workaround right now that forces the driver load like so:

import pyogrio

pyogrio.core._register_drivers()

but given that _register_drivers is a private method, we would prefer not to rely on it. We're opening this issue to try to figure out where the GDAL driver loading logic might be failing even after #145. Unsure if putting GDALAllRegister() back in _io.pyx and/or _ogr.pyx would help, or if there is another solution we can try.

brendan-ward commented 1 month ago

Thanks for the report! This is a tricky one, and appears to be an order of operations, possibly loading issue loading the GDAL library multiple times from different packages (e.g., pyogrio, rasterio).

I can reproduce locally by installing a Conda env based on your environment.

I'm seeing that it appears to work properly at first and then fail later. To test this, I edited the geopandas/io/file.py::__read_file_pyogrio to print drivers after importing pyogrio.

When I run the equivalent of your make html (with logging): PYGMT_USE_EXTERNAL_DISPLAY="false" sphinx-build -b html -d _build/doctrees -j auto . _build/html -v I see that it first successfully loads all drivers when running examples/gallery/maps/choropleth_map.py. It then runs examples/gallery/maps/tilemaps.py and I see warnings about rio, which suggests that it is loading rasterio somewhere in the stack at that point. After that, it runs examples/gallery/lines/roads.py and now lists no drivers before crashing.

If I run the failing file directly: PYGMT_USE_EXTERNAL_DISPLAY="false" python examples/gallery/lines/roads.py

It loads all drivers successfully and works as expected.

If I comment out the contents of examples/gallery/maps/tilemaps.py and then run sphinx-build again (above command), then roads.py is successful. So - something in tilemaps.py is causing GDAL loading to get messed up.

I've isolated the problematic line that changes the behavior of loading drivers in pyogrio. If we import pyogrio and list drivers before this, they are as expected. If we import pyogrio after this line, no drivers are loaded.

I don't use xarray, so I'm not able to go much further down this rabbit hole. Based on what I'm seeing, this isn't an error that is caused directly by pyogrio (though please follow up here if you discover otherwise). It seems to be based on multiple libraries loading GDAL differently and then based on order of operations, seems to be leaving the drivers available in GDAL in a bad state when subsequent packages try to load GDAL.

brendan-ward commented 1 month ago

@snowman2 do you have any ideas about what might be causing this order of operations driver loading issue based on rioxarray? It seems that loading GDAL drivers works fine in pyogrio before a call to rioxarray, and then does not work properly afterward. That is, if rioxarray is called before pyogrio, we don't get drivers from GDAL, but if it is called after, then things work properly.

snowman2 commented 1 month ago

rioxarray imports rasterio. Did you try importing rasterio before/after pyogrio?

seisman commented 1 month ago

Here is a much smaller working example (compared to building the PyGMT documentation) to reproduce the issue:

import geopandas as gpd

gdf = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2015/PRISECROADS/tl_2015_15_prisecroads.zip")

import pygmt

fig = pygmt.Figure()
fig.tilemap(region=[-157.84, -157.8, 21.255, 21.285], projection="M12c", zoom=14, frame="afg")
fig.show()

gdf = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2015/PRISECROADS/tl_2015_15_prisecroads.zip")
brendan-ward commented 1 month ago

Importing rasterio first works fine (as does simply importing rioxarray).

Thanks for the smaller example @seisman .

It looks like an offending section in pygmt/src/tilemap.py is this one (if I comment it out, your example works as expected):

with Session() as lib:
        with lib.virtualfile_in(check_kind="raster", data=raster) as vingrd:
            lib.call_module(
                module="grdimage", args=build_arg_list(kwargs, infile=vingrd)
            )
weiji14 commented 1 month ago

Thanks so much @brendan-ward for taking the time to debug this! Your comment at https://github.com/geopandas/pyogrio/issues/448#issuecomment-2219139470 makes a lot of sense (to me at least), I spent hours debugging this and did notice how the first choropleth_map.py example worked fine, but roads.py didn't, but didn't realize how tilemaps.py in the middle was possibly causing issues.

@seisman, it seems like GMT might also be calling GDALAllRegister (e.g. at https://github.com/GenericMappingTools/gmt/blob/804d7674b890fe0e83411b1083ddbb2128b49847/src/gmt_gdalread.c#L721), would the interaction between pyogrio/GMT/rasterio(GDAL) be causing issues?

seisman commented 1 month ago

it seems like GMT might also be calling GDALAllRegister (e.g. at https://github.com/GenericMappingTools/gmt/blob/804d7674b890fe0e83411b1083ddbb2128b49847/src/gmt_gdalread.c#L721), would the interaction between pyogrio/GMT/rasterio(GDAL) be causing issues?

It's likely. Here is an even smaller example to reproduce it, without calling the Figure.tilemap/load_tile_map method/function. So it means it's not directly related to rioxarray/rasterio. To me, it seems as long as GMT tries to read an image, it crashes:

import geopandas as gpd

gdf = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2015/PRISECROADS/tl_2015_15_prisecroads.zip")

import pygmt

fig = pygmt.Figure()
fig.grdimage("@earth_day_01d")
fig.show()

gdf = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2015/PRISECROADS/tl_2015_15_prisecroads.zip")

The upstream documentation says (https://gdal.org/api/raster_c_api.html#_CPPv415GDALAllRegisterv):

This function should generally be called once at the beginning of the application.

but apparently, it's called multiple times even in GMT.