azavea / osmesa

OSMesa is an OpenStreetMap processing stack based on GeoTrellis and Apache Spark
Apache License 2.0
80 stars 26 forks source link

Write tiles as Tapalcatl 2 archives: zipped pyramids of vector tiles on S3 #49

Open mojodna opened 6 years ago

mojodna commented 6 years ago

(I'm paraphrasing a bit here, because I may be interpreting tapalcatl's intent slightly differently.) Tapalcatl (Python) facilitate storage of multiple formats + tiles within a single meta-tile. The meta-tile is a zip file. (This is effectively the generalized tile equivalent of a Cloud Optimized GeoTIFF.)

I propose grouping pyramids of generated vector tiles (initially 8 zoom levels, but that may be too many depending on the size of the zip file index and file itself) and zipping them together prior to writing to S3. This will dramatically reduce the number of objects written to S3 (reducing latency and cost) while facilitating improved caching of data.

Under this proposal, if /4/5/6.mvt (the target tile) were requested, /0/0/0.zip (the meta tile) would be fetched and /4/5/6.mvt extracted from it.

Zip files include a directory w/ entry offsets, so it's possible to do partial reads of the zip file (3 requests: 2 for the directory (which can be cached), one for the entry itself (which can be cached as part of larger block reads, e.g. 10MB aligned)) (see https://github.com/mojodna/tilelive-tapalcatl for a really preliminary implementation of this).

Reader support for tapalcatl meta tiles could be implemented in client code, via a proxy server (tapalcatl-py) that allows individual tile requests, or as a Service Worker that can intercept tile requests + cache blocks.

mojodna commented 5 years ago

More information:

mojodna commented 5 years ago

Now that a Tapalcatl 2 proof of concept exists (server implementation here: https://github.com/mojodna/lambda-tileserver), this can be a bit more concrete. saveInZips is a preliminary implementation of this approach (using sub-pyramids comprising 8 zooms, which should be configurable per https://beta.observablehq.com/@mojodna/tapalcatl-storage-calculator):

https://github.com/azavea/osmesa/blob/f74e9ef01b2c946c78d56271f520d6bf9d65ed44/src/common/src/main/scala/osmesa/common/GenerateVT.scala#L85-L115

However, for it to match the Tapalcatl 2 spec, it needs to write archive-level metadata (as ZIP comments) and the output process as a whole needs to create a meta.json at the root of the S3 bucket, describing the archive).

Support for metatiles is a plus (and provides more knobs for controlling archive size).

Support for updating tiles should be included; this is likely to be more involved, as existing archives need to be downloaded and cached locally prior to being updated (in which case tiles containing to the same archive should be grouped together).

On the up-side, application-specific metadata can be included at the archive level to indicate which sequences have been applied (archive-wide), rather than using the existing hack, which inserts a feature into the center of a __sequences__ MVT layer with a list of sequences.