azavea / osmesa

OSMesa is an OpenStreetMap processing stack based on GeoTrellis and Apache Spark
Apache License 2.0
80 stars 26 forks source link

Create an AugmentedDiffSource #74

Closed mojodna closed 6 years ago

mojodna commented 6 years ago

...similar to ChangesSource, etc.

AugmentedDiffStreamProcessor currently uses Spark Streaming's textFile source, which polls a directory (or bucket prefix) to populate the stream. There's no way to specify a starting point, so all JSON files that exist in the path will be loaded and processed.

We'd pushed off doing this against the interim JSON augmented diffs, but the downsides of the textFile source are becoming painful.

A side-effect of implementing a first-class source is that Row mangling can be done within the source.