GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.09k stars 259 forks source link

Inconsistent bounding box filtering #258

Closed lgolston closed 7 months ago

lgolston commented 1 year ago

PyShp Version

2.3.1

Python Version

3.11

Your code

import shapefile

fname = "dataverse_files/v6_1820_pref_pts_gbk.shp"
bbox = [19166000, 3461000, 20297000, 3596000]

sf = shapefile.Reader(fname, encoding='gbk')
print(len(list(sf.iterShapeRecords(bbox=bbox))))

Expected results

10

Actual results

301

Other notes

As part of a PR over at cartopy (SciTools/cartopy#2236) I was comparing the output from bounding box spatial filtering with PyShp and fiona for several shapefiles and found some differences. For this shapefile (which just contains Points), PyShp does not seem to filter at all. I think this is a bug? The data used is at https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/2K4FHX/V0WEFJ&version=1.0

lgolston commented 1 year ago

Okay, the pull request created hopefully addresses the problem with the shapefile mentioned above involving points.

Another case I tried is the 50m land and glaciated areas Natural Earth shapefiles, which both worked as expected.

However, at 10m resolution the Natural Earth land shapefile switches from containing many Polygon shapes to a small number of MultiPolygon, and the bounding box filtering appears to break down again. For instance, selecting a bbox over the ocean ([-25, -25, -5, -5]) for the land shapefile (https://www.naturalearthdata.com/downloads/10m-physical-vectors/, using v5.1.1), fiona returns only 2 records while pyshp returns 10 (out of the 11 total records). I suspect this is because fiona looks at individual polygons, while PyShp only considers the overall bounding box of the MultiPolygon (which has shapeType 5 as for Polygon): https://github.com/GeospatialPython/pyshp/blob/d3b6e2a91668867e4f2168d32b8f2c53abfb09d6/shapefile.py#L1318-L1325