WikiWatershed / rapid-watershed-delineation

Rapid Watershed Delineation Code for MMW2
Apache License 2.0
12 stars 6 forks source link

Compress shapefiles #51

Open kdeloach opened 7 years ago

kdeloach commented 7 years ago

Investigate enabling compression on NHD shapefiles to reduce filesize. This could potentially improve the responsiveness of RWD after a new worker has been provisioned.

Ref: https://github.com/WikiWatershed/rapid-watershed-delineation/issues/46#issuecomment-267411647

mmcfarland commented 7 years ago

I don't believe there is an OGR driver for compressing/reading compressed shapefiles, but we should do this for the rasters via gdal drivers.

emiliom commented 7 years ago

In case it helps ... OGR can read zipped (or tarballed) shape files using the "virtual file system" (vfs/vsi) drivers or approach. I couldn't find a good, focused documentation on this at the official gdal/ogr web site, other than this terse and unhelpful ogr "memory" driver page. But you can google for "vsi" or "vsizip" and find references easily. I found this old but good blog post, plus a somewhat old GDAL/OGR wiki page.

There are probably important performance issues that you should read up on before pursuing that route, though. I've used vsi stuff sporadically, and not enough to know much about performance issues.

mmcfarland commented 7 years ago

That's a good tip, @emiliom. I've use vsicurl with gdal, we'll check out vsizip to see if the compression on the shapefiles buys us much and at what cost.

emiliom commented 7 years ago

Glad it was helpful, @mmcfarland. I'd be interested in hearing (say, via this github issue) what you learn about cost tradeoffs, once you've tested it -- regardless of whether that's in the next 24 hours or 24 weeks ...