Open brendan-ward opened 2 years ago
Another possibility is to leverage OpenStreetMap data, accessed using OSMnx for instance. OSM gives access to any kind of geometries and even mixes of them, using building shapes, administrative regions, railways, points of interest, water bodies...
I'm mentioning OSMnx because that's the package I know which makes it easiest to download and digest OSM data (see this example), but it could be anything else that does the trick. The downside of OSMnx for the purpose of this repo is that it requires networkx, which would be a useless dependency here.
OSM is a good source. For performance reasons, it may be better to get the larger data using pyrosm
but that is a minor detail.
British Ordnance Survey has a series of GB-wide open datasets with polygons, lines and points at https://osdatahub.os.uk/downloads/open.
A few data sources to consider for bigger benchmarks:
U.S. high resolution hydrography data
These are served by 4th-code watersheds (download a *_gdb.zip) that have data within an ESRI File Geodatabase.
We use some of these in pyogrio
Useful for testing intersection of waterbodies and flowlines, clipping, etc.
World Database on Protected Areas (see the download button)
One of the "advantages" for doing bencharks with some of these is that the geometries are not always clean, so these could be good for benchmarking things like making them valid or unioning them together, or intersecting them with admin boundaries like countries or EEZs (below).
Marine regions
For example, the World EEZ (Exclusive Economic Zones) dataset is a useful one to try and intersect with marine protected areas above.