geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

ENH: Suppressing GDAL errors #289

Open cheginit opened 1 year ago

cheginit commented 1 year ago

GDAL's python binding has this option to suppress all warnings temporarly:


from osgeo import gdal

with gdal.ExceptionMgr(useExceptions=False), gdal.quiet_errors():
   # some operation

This is gdal's quiet_errors function:

@contextlib.contextmanager
def quiet_errors():
    """Temporarily install an error handler that silents all warnings and errors.

   Returns
   -------
        A context manager

   Example
   -------

       with gdal.ExceptionMgr(useExceptions=False), gdal.quiet_errors():
           gdal.Error(gdal.CE_Failure, gdal.CPLE_AppDefined, "you will never see me")
    """
    PushErrorHandler("CPLQuietErrorHandler")
    try:
        yield
    finally:
        PopErrorHandler()

I haven't been able to find a similar functionality in pyogrio. It would be very helpful since the warning messages can be very extensive!

brendan-ward commented 1 year ago

Are you wanting to hide only the warnings emitted by GDAL, or also hide warnings emitted by pyogrio when using the GDAL API? We also emit our own warnings that may indicate certain issues.

Fatal errors using GDAL are converted to Python exceptions instead of warnings; you can sidestep some of those with try / except blocks but generally they indicate something failed badly.

cheginit commented 1 year ago

I want to hide the warning/errors when using pyogrio as the engine for reading files with geopandas. Seeing those warnings and errors is useful only once, so I can investigate and make necessary changes. But, I would rather not see the warnings when I rerun the code. When I use gdal's quiet_errors with gpd.read_file it works well and hides the warnings, but I would prefer not to have osgeo as a hard dependency.

brendan-ward commented 1 year ago

Can you give us some examples of warnings / errors emitted by GDAL / pyogrio that you'd like to suppress?

You should be able to use warnings.filterwarnings to suppress warnings and try / exception to suppress errors, but it is possible some of the GDAL errors / warnings are still making their way through.

cheginit commented 1 year ago

Sure. This is what I have:

if self._engine == "pyogrio":
    import pyogrio

    try:
        pyogrio.set_gdal_config_options(
            {"OGR_GEOMETRY_ACCEPT_UNCLOSED_RING": "YES", "OGR_ORGANIZE_POLYGONS": "SKIP"}
        )
        warnings.filterwarnings("ignore", message=".*Non closed ring detected.*")
        warnings.filterwarnings("ignore", message=".*translated to Simple Geometry.*")
        return gpd.read_file(gdb, engine="pyogrio", use_arrow=True)
    except GEOSException:
        return gpd.read_file(gdb)
else:
    return gpd.read_file(gdb)

But I still get Warning 1: Non closed ring detected. and Warning 1: Geometry of polygon cannot be translated to Simple Geometry. All polygons will be contained in a multipolygon.

brendan-ward commented 1 year ago

Thanks for the extra info, that should help us consider how best to suppress warnings like these from GDAL.

Are you able to share a small subset of your dataset with a record that triggers that error? I'm thinking it might be hard for us to fabricate a test dataset with such issues.

cheginit commented 1 year ago

That's strange. While I was preparing a reproducible example and creating an environment that includes only the necessary packages, the warnings didn't show. But when I include other packages that I need for my project, the warnings appear again. Is it possible that some other packages can cause warnings to not get caught?

The source code and the dataset are public. Here's the code for retrieving the data:

from pygeohydro import EHydro
import numpy as np
import shapely
from pynhd import NLDI

nldi = NLDI()
flw = nldi.navigate_byid("nwissite", "USGS-14246900", "upstreamMain", "flowlines", 400)

ehydro = EHydro()
idx = ehydro.survey_grid.sindex.query(shapely.box(*flw.total_bounds))
grid = ehydro.survey_grid.iloc[idx].reset_index(drop=True)
_, idx = grid.sindex.query(grid.geometry, predicate="intersects")
_, freq = np.unique(idx, return_counts=True)
grid = grid.iloc[np.where(freq > 1)[0]]
geom = grid.unary_union
bathy = ehydro.bygeom(geom, grid.crs)

The snippet downloads the datasets, which are saved as (many) zip files under ./cache directory. So, if you want the files, you can find them there.

The relevant part of the code that uses pyogrio is here.

If you create a simple environment like this, the warnings will not be shown:

I just pushed the latest commit to pygeohydro so you need to install it from git:

mamba create -y -n ogr pygeohydro pyogrio ipykernel
mamba activate ogr
pip install --no-deps git+https://github.com/hyriver/pygeohydro

But if you create the env using this, the warning will be shown:

name: ogr
channels:
- conda-forge
- nodefaults
dependencies:
- python>=3.10

# async-retriever deps
- aiodns
- aiosqlite
- aiohttp >=3.8.3
- brotli
- cytoolz
- nest-asyncio
- aiohttp-client-cache >=0.8.1
- ujson

# pygeoogc deps
# - async-retriever>=0.15,<0.16
- cytoolz
- defusedxml
- joblib
- multidict
- owslib>=0.27.2
- pyproj>=3.0.1
- requests
- requests-cache>=0.9.6
- shapely>=1.8.5
- ujson
- url-normalize>=1.4
- urllib3
- yarl

# pygeoutils deps
- cytoolz
- geopandas >=0.7
- netcdf4
- numpy >=1.21
- pyproj >=2.2
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- ujson
- xarray >=2023.01.0

# hydrosignatures deps
- numpy
- pandas
- scipy
- xarray
# optional deps
- numba

# py3dep
# - async-retriever >=0.3.6
- click >=0.7
- cytoolz
- numpy >=1.21
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional dep
- pyflwdir >=0.5.6

# pynhd deps
# - async-retriever >=0.3.6
- cytoolz
- geopandas >=0.9
- networkx
- numpy >=1.21
- pandas >=1.0
- pyarrow >=1.0.1
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- shapely >=2.0
# optional deps
- pyogrio
- py7zr

# pydaymet deps
# - async-retriever >=0.3.6
- click >=0.7
- lxml
- numpy >=1.21
- pandas >=1.0
# - py3dep >=0.13.7
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
- rasterio >=1.2
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- numba

# pygeohydro deps
- cytoolz
- defusedxml
- folium
- geopandas >=0.7
- h5netcdf
# - hydrosignatures >=0.1.1
- lxml
- matplotlib-base >=3.5
- numpy >=1.21
- pandas >=1.0
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
# - pynhd >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11.0
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- planetary-computer
- pystac-client

# pynldas2
# - async-retriever >=0.3.6
- h5netcdf
- numpy >=1.21
- pandas >=1.0
# - pygeoutils >=0.13.10
- pyproj >=2.2
- rioxarray >=0.11
- xarray >=2023.01.0

# optional deps for speeding up some operations
- bottleneck

# bathy deps
- numba
- pandamesh
- shapelysmooth

# plotting deps
- mapclassify
- contextily
- hvplot
- tqdm
- xarray-spatial
- datashader

# dev deps
- ipywidgets
- ipykernel
- pre-commit

- pip
- pip:
  - git+https://github.com/hyriver/async-retriever.git
  - git+https://github.com/hyriver/hydrosignatures.git
  - git+https://github.com/hyriver/pygeoogc.git
  - git+https://github.com/hyriver/pygeoutils.git
  - git+https://github.com/hyriver/pynhd.git
  - git+https://github.com/hyriver/py3dep.git
  - git+https://github.com/hyriver/pydaymet.git
  - git+https://github.com/hyriver/pynldas2.git
  - git+https://github.com/hyriver/pygeohydro.git
brendan-ward commented 1 year ago

I wonder if one of the other packages in the larger environment is changing the state of warning filtering. Like you say, in the minimal environment, I do not get these warnings if I filter them warnings.filterwarnings("ignore", message=".*Non closed ring detected.*"), but I get them if I don't filter them when I try to read one of the problematic MultiPolygon layers (e.g., read_dataframe(".../cache/CL_27_MGN_20150423.ZIP!CL_27_MGN_20150423.gdb",layer="Bathymetry_Vector").

Within the same script, if I set warnings to show all warnings via warnings.simplefilter("always"), even after first setting the filter on warnings, then I see all instances of the GDAL warnings raised. I'm not sure how the state of warnings filtering gets updated across the packages you import. However, the only import I'm seeing within geopandas.read_file when using pyogrio imports pyogrio, which you already have in scope. So I'm not seeing a place where warnings would be filtered differently after you set them.

I don't use conda / mamba enough to guess at how that might cause one environment to raise warnings and the other not to.

This doesn't negate wanting to add a global way of disabling warnings / errors from GDAL, just that warning suppression seems to be dependent on environment.

cheginit commented 1 year ago

I also think in the large environment, the versions of gdal and other packages are not the same and maybe this was an issue in previous versions that has been fixed in later versions.