AmericanRedCross / osm-stats-workers

BSD 3-Clause "New" or "Revised" License
4 stars 8 forks source link

Map pipeline #1

Closed kamicut closed 8 years ago

kamicut commented 8 years ago

The map pipeline is the following:

  1. Geo data comes from planet-stream
  2. The workers calculate the metrics but keep the geo data and add it to a cache
  3. The cache keeps the last 100 records for each hashtag
  4. The leaderboard displays the last 100 records in a loop for that hashtag's page

We need to figure out where to add the caching code. Should that be at the kinesis level, or at the lambda worker level?

cc @smit1678 @matthewhanson

matthewhanson commented 8 years ago

@kamicut what do you mean at the kinesis level? Do you mean when it is added to kinesis by planet-stream? I think it belongs in the lambda worker function. It can add the geometry to the cache when it adds it to the database.

kamicut commented 8 years ago

I was thinking that it could be possible to fire two types of lambda functions, one that stores in cache and one that calculates the metrics. There would be a separation of concerns and it could be simpler to debug. The disadvantage is that they might get out of sync.

matthewhanson commented 8 years ago

That certainly would be easier to debug, especially given the difficulties in unraveling the large amount of data in the lambda logs. Firing a second lambda function off the same kinesis stream would be no problem. What problems would we have if they got out of sync?

kamicut commented 8 years ago

It would be more of a display problem: the leaderboards would show different total edits then the map view. This would be most apparent for mappers that commit large changesets infrequently. Their edits would make the map but the leaderboard would be delayed.

I don't think this is a huge issue as long as we can process large changesets in a reliable way.

matthewhanson commented 8 years ago

I don't think it will end up being noticeable for the vast majority of users, as it looks like only a handful that prefer large commits.

dalekunce commented 8 years ago

Keep in mind these were all very new mappers using only iD. Other mapathons have experienced users using JOSM all night long. I also usually favor the big commit >500+ changes.