azavea / osmesa

OSMesa is an OpenStreetMap processing stack based on GeoTrellis and Apache Spark
Apache License 2.0
80 stars 26 forks source link

Planned usage #149

Open baerbock opened 5 years ago

baerbock commented 5 years ago

Hi Seth,

what's the planned usage of osmesa?

Are than any functions not somehow related to statistics?

What abouting hosting a How do you contribute? tool on steroids?

Best wishes

mojodna commented 5 years ago

OSMesa isn't a product per se; it's more of a collection of functions that people can use to achieve their own specific goals. It's primarily concerned with assembling geometries (including for many relation types) from historical (and snapshot) data (though this is moving to VectorPipe), which can be subsequently measured, gridded, counted, etc. There are streaming sources for minutely OSM diffs + changeset replication streams as well as Overpass-derived augmented diffs.

Last week, @jenningsanderson and I used OSMesa to generate 1.4TB of historical geometries in 35 minutes using 2400 CPU cores. (including "minor versions", where modified nodes changed way geometry) for all tagged nodes, all ways, and multipolygon and route relations (OSMesa also supports boundary relations, but we skipped them since the large geometries created can be problematic).

We currently use it to populate (and update) a Postgres database to support Scoreboard (which can be considered an updated version of the Missing Maps leaderboards). This is where the bulk of the statistics functions are used.

We also use it to generate data cubes containing edit counts by day, gridded at ~zoom 9 and aggregated, to visualize OSM edit recency and to produce a time series heatmap of edits by month in Detroit.

image

image

A variant of this produced data to render individual user heatmaps. (Not scalable to the entire OSM dataset in its current form due to the number of contributors.)

image

We've also used it to generate full history MVTs (e.g. OSM QA Tiles, but with all geometries, not just the most recent) @ z15 for selected areas (the size of boundary relations and current inability to generalize for lower zooms have prevented us from generating these for larger areas). These have been used to power visualizations like the mapping of Disneyland and an explorer for Detroit, to see what OSM looked like at any point in the past.

image

image

It's also useful for doing ad hoc analysis, e.g. looking at tag changes between element versions.

It contains all of the components to build an HDYC clone (excepting OSM notes, changeset discussions, and heuristics for mapping numbers to things like "casual mapper"). What would you envision as "on steroids" that would make it worth embarking on such an endeavor? (I lack the imagination for this ;-)