azavea / osmesa

OSMesa is an OpenStreetMap processing stack based on GeoTrellis and Apache Spark
Apache License 2.0
80 stars 26 forks source link

Cache improvements #47

Closed mojodna closed 6 years ago

mojodna commented 6 years ago

Prior to this, cached stages will be run once to populate the cache (triggered by write) and once where they're used subsequently.

The alternative to this is to add a cache call to the write chain to avoid ORC read overhead, but we've seen problems with cache on larger stages where the job eventually fails.

This also fixes the local FS cache implementation.

mojodna commented 6 years ago

@lossyrob this is the behavior I was referring to:

image

shuffle content doesn't seem to be preserved between jobs and stages appear to be re-run:

image