Closed mx-moth closed 1 year ago
shapely.wkb.dumps()
/ shapely.wkb.loads()
exist and work, with one caveat:
>>> import shapely.wkb
>>> from shapely.geometry import Polygon, GeometryCollection
>>> empty = Polygon()
>>> isinstance(empty, Polygon)
True
>>> empty.is_empty
True
>>> round_trip = shapely.wkb.loads(shapely.wkb.dumps(empty))
>>> isinstance(round_trip, Polygon)
False
>>> isinstance(round_trip,GeometryCollection)
True
>>> round_trip.is_empty
True
Empty polygons come back as empty GeometryCollections for Reasons. This is easy enough to detect so shouldn't concern us.
Caching geometry is no longer relevant, as polygon construction has been sped up dramatically by using new interfaces introduced in Shapely 2.0.0.
Dataset geometry as provided in
Format.polygons
could be cached. This would allow quicker repeated operations on known datasets.Possible interface
Discussion
Geometry can be cached using a WKB GeometryCollection. This is stand alone and unencumbered (unlike Shapefiles), understood by many readers (less important, this is an 'internal' representation)...
Making polygons is one of the most expensive operations when opening a dataset, and most emsarray operations depend on geometry. Caching this makes sense. Should we leave open the option of caching more things? Perhaps caching things in a .tar, and each cached thing could be a file within there. Perhaps
emsarray.cache.dump
/emsarray.cache.load
which calls eachdump_foo
/load_foo
.Do we bother with cache invalidation? i.e. if the model geometry has changed. Unsure how to do this without recomputing the entire geometry. Could possibly make a geometry hash based on the (emsarray version, format class, geometry variables)? As long as computing the hash is not detrimentally slow.