Open cheginit opened 1 year ago
Are you wanting to hide only the warnings emitted by GDAL, or also hide warnings emitted by pyogrio when using the GDAL API? We also emit our own warnings that may indicate certain issues.
Fatal errors using GDAL are converted to Python exceptions instead of warnings; you can sidestep some of those with try
/ except
blocks but generally they indicate something failed badly.
I want to hide the warning/errors when using pyogrio
as the engine for reading files with geopandas
. Seeing those warnings and errors is useful only once, so I can investigate and make necessary changes. But, I would rather not see the warnings when I rerun the code. When I use gdal
's quiet_errors
with gpd.read_file
it works well and hides the warnings, but I would prefer not to have osgeo
as a hard dependency.
Can you give us some examples of warnings / errors emitted by GDAL / pyogrio that you'd like to suppress?
You should be able to use warnings.filterwarnings
to suppress warnings and try
/ exception
to suppress errors, but it is possible some of the GDAL errors / warnings are still making their way through.
Sure. This is what I have:
if self._engine == "pyogrio":
import pyogrio
try:
pyogrio.set_gdal_config_options(
{"OGR_GEOMETRY_ACCEPT_UNCLOSED_RING": "YES", "OGR_ORGANIZE_POLYGONS": "SKIP"}
)
warnings.filterwarnings("ignore", message=".*Non closed ring detected.*")
warnings.filterwarnings("ignore", message=".*translated to Simple Geometry.*")
return gpd.read_file(gdb, engine="pyogrio", use_arrow=True)
except GEOSException:
return gpd.read_file(gdb)
else:
return gpd.read_file(gdb)
But I still get Warning 1: Non closed ring detected.
and Warning 1: Geometry of polygon cannot be translated to Simple Geometry. All polygons will be contained in a multipolygon.
Thanks for the extra info, that should help us consider how best to suppress warnings like these from GDAL.
Are you able to share a small subset of your dataset with a record that triggers that error? I'm thinking it might be hard for us to fabricate a test dataset with such issues.
That's strange. While I was preparing a reproducible example and creating an environment that includes only the necessary packages, the warnings didn't show. But when I include other packages that I need for my project, the warnings appear again. Is it possible that some other packages can cause warnings to not get caught?
The source code and the dataset are public. Here's the code for retrieving the data:
from pygeohydro import EHydro
import numpy as np
import shapely
from pynhd import NLDI
nldi = NLDI()
flw = nldi.navigate_byid("nwissite", "USGS-14246900", "upstreamMain", "flowlines", 400)
ehydro = EHydro()
idx = ehydro.survey_grid.sindex.query(shapely.box(*flw.total_bounds))
grid = ehydro.survey_grid.iloc[idx].reset_index(drop=True)
_, idx = grid.sindex.query(grid.geometry, predicate="intersects")
_, freq = np.unique(idx, return_counts=True)
grid = grid.iloc[np.where(freq > 1)[0]]
geom = grid.unary_union
bathy = ehydro.bygeom(geom, grid.crs)
The snippet downloads the datasets, which are saved as (many) zip files under ./cache
directory. So, if you want the files, you can find them there.
The relevant part of the code that uses pyogrio
is here.
If you create a simple environment like this, the warnings will not be shown:
I just pushed the latest commit to pygeohydro
so you need to install it from git:
mamba create -y -n ogr pygeohydro pyogrio ipykernel
mamba activate ogr
pip install --no-deps git+https://github.com/hyriver/pygeohydro
But if you create the env using this, the warning will be shown:
name: ogr
channels:
- conda-forge
- nodefaults
dependencies:
- python>=3.10
# async-retriever deps
- aiodns
- aiosqlite
- aiohttp >=3.8.3
- brotli
- cytoolz
- nest-asyncio
- aiohttp-client-cache >=0.8.1
- ujson
# pygeoogc deps
# - async-retriever>=0.15,<0.16
- cytoolz
- defusedxml
- joblib
- multidict
- owslib>=0.27.2
- pyproj>=3.0.1
- requests
- requests-cache>=0.9.6
- shapely>=1.8.5
- ujson
- url-normalize>=1.4
- urllib3
- yarl
# pygeoutils deps
- cytoolz
- geopandas >=0.7
- netcdf4
- numpy >=1.21
- pyproj >=2.2
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- ujson
- xarray >=2023.01.0
# hydrosignatures deps
- numpy
- pandas
- scipy
- xarray
# optional deps
- numba
# py3dep
# - async-retriever >=0.3.6
- click >=0.7
- cytoolz
- numpy >=1.21
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional dep
- pyflwdir >=0.5.6
# pynhd deps
# - async-retriever >=0.3.6
- cytoolz
- geopandas >=0.9
- networkx
- numpy >=1.21
- pandas >=1.0
- pyarrow >=1.0.1
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.7
- shapely >=2.0
# optional deps
- pyogrio
- py7zr
# pydaymet deps
# - async-retriever >=0.3.6
- click >=0.7
- lxml
- numpy >=1.21
- pandas >=1.0
# - py3dep >=0.13.7
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
- rasterio >=1.2
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- numba
# pygeohydro deps
- cytoolz
- defusedxml
- folium
- geopandas >=0.7
- h5netcdf
# - hydrosignatures >=0.1.1
- lxml
- matplotlib-base >=3.5
- numpy >=1.21
- pandas >=1.0
# - pygeoogc >=0.13.7
# - pygeoutils >=0.13.9
# - pynhd >=0.13.7
- rasterio >=1.2
- rioxarray >=0.11.0
- scipy
- shapely >=2.0
- xarray >=2023.01.0
# optional deps
- planetary-computer
- pystac-client
# pynldas2
# - async-retriever >=0.3.6
- h5netcdf
- numpy >=1.21
- pandas >=1.0
# - pygeoutils >=0.13.10
- pyproj >=2.2
- rioxarray >=0.11
- xarray >=2023.01.0
# optional deps for speeding up some operations
- bottleneck
# bathy deps
- numba
- pandamesh
- shapelysmooth
# plotting deps
- mapclassify
- contextily
- hvplot
- tqdm
- xarray-spatial
- datashader
# dev deps
- ipywidgets
- ipykernel
- pre-commit
- pip
- pip:
- git+https://github.com/hyriver/async-retriever.git
- git+https://github.com/hyriver/hydrosignatures.git
- git+https://github.com/hyriver/pygeoogc.git
- git+https://github.com/hyriver/pygeoutils.git
- git+https://github.com/hyriver/pynhd.git
- git+https://github.com/hyriver/py3dep.git
- git+https://github.com/hyriver/pydaymet.git
- git+https://github.com/hyriver/pynldas2.git
- git+https://github.com/hyriver/pygeohydro.git
I wonder if one of the other packages in the larger environment is changing the state of warning filtering. Like you say, in the minimal environment, I do not get these warnings if I filter them warnings.filterwarnings("ignore", message=".*Non closed ring detected.*")
, but I get them if I don't filter them when I try to read one of the problematic MultiPolygon layers (e.g., read_dataframe(".../cache/CL_27_MGN_20150423.ZIP!CL_27_MGN_20150423.gdb",layer="Bathymetry_Vector"
).
Within the same script, if I set warnings
to show all warnings via warnings.simplefilter("always")
, even after first setting the filter on warnings, then I see all instances of the GDAL warnings raised. I'm not sure how the state of warnings filtering gets updated across the packages you import. However, the only import I'm seeing within geopandas.read_file
when using pyogrio imports pyogrio, which you already have in scope. So I'm not seeing a place where warnings would be filtered differently after you set them.
I don't use conda / mamba enough to guess at how that might cause one environment to raise warnings and the other not to.
This doesn't negate wanting to add a global way of disabling warnings / errors from GDAL, just that warning suppression seems to be dependent on environment.
I also think in the large environment, the versions of gdal
and other packages are not the same and maybe this was an issue in previous versions that has been fixed in later versions.
GDAL's python binding has this option to suppress all warnings temporarly:
This is
gdal
'squiet_errors
function:I haven't been able to find a similar functionality in
pyogrio
. It would be very helpful since the warning messages can be very extensive!