azavea / osmesa

OSMesa is an OpenStreetMap processing stack based on GeoTrellis and Apache Spark
Apache License 2.0
80 stars 26 forks source link

Check for off-by-ones in Spark Streaming sources #75

Closed mojodna closed 6 years ago

mojodna commented 6 years ago

I'm still a bit fuzzy on how Offsets are handled. For the (pending) AugmentedDiffSource, a current sequence of 3094146 results in 3094145.json being fetched.

Similarly, commit() is called with values that I don't expect.

Offset handling in ReplicationStreamMicroBatchReader is likely to blame: https://github.com/azavea/osmesa/blob/bf8565602ff13256281cfc611fb9ced4b1a3d397/src/common/src/main/scala/osmesa/common/streaming/ReplicationStreamMicroBatchReader.scala#L60-L99