geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
260 stars 22 forks source link

ENH: Calculate feature count ourselves if GDAL returns unknown count (-1), better support OSM data #271

Closed brendan-ward closed 10 months ago

brendan-ward commented 11 months ago

Resolves #269, #272

GDAL will return -1 for some drivers if it has specifically disabled the ability to get a feature count (e.g., OSM driver).

Since we need to know this number in order to allocate arrays before reading features, we need to get this count in order to be able to read from certain drivers.

theroggy commented 11 months ago

Ideally there is also a test to cover this code path? But possibly there is no write support for a driver behaving this way?

brendan-ward commented 11 months ago

The fix for #272 can be verified manually using a sufficiently large (probably > 100 MB) OSM file from Geofabrik, e.g., Greece; it will return a count of 0 without the fix added in 31d7df344643e3b2c65a9491af68b6d3dac4d697

from pyogrio import read_info

assert read_info('greece-latest.osm.pbf', layer='lines')['features'] > 0

It fixes both reading with use_arrow=True and without.