geotrellis / vectorpipe

Convert Vector data to VectorTiles with GeoTrellis.
https://geotrellis.github.io/vectorpipe/
Other
74 stars 20 forks source link

OSM => VectorTiles :: (3) RDD[OSMElement] to RDD[Feature[G, D]] #3

Closed fosskers closed 6 years ago

fosskers commented 8 years ago

This is Part 3 of a series of issues documenting the process of creating a world's worth of VectorTiles from OSM Planet data. Please use these issues to discuss solutions.

OSM data has the concepts of Node, Way, and Relation, which don't directly correspond to GeoTrellis geometries.

Questions:

  1. What is necessary for such a conversion?
  2. What should D be?
fosskers commented 8 years ago

A new question which has arisen from an initial foray into implementing this conversion. Relations can contain other Relations, and such a chain can form a graph of Relations. Asking in the OSM IRC revealed that these Relation graphs should form Trees, but in reality (via human error) often don't.

A Tree structure would be much simpler to handle when denormalizing Relation metadata across their children.

Question: Is it true that all OSM Relation graphs should be Trees, and that any current non-Tree Relation graphs are illegitimate?

fosskers commented 7 years ago

At the time of posting, master reflects the near-completion of this feature. I'm going to dive into Part 2 on a separate branch, testing simple parse cases and feeding the results into Part 3's code.

Notes on Relation Graphs

The syntax of OSM XML is lax enough to allow for unmeaningful data relationships. Relations can refer to other Relations with essentially no restrictions. This means that while we hope there are no weird things like cyclic Relation graphs, there is nothing about the XML or the change submission process that prevents this. Part 3's code (currently) makes some arbitrary assumptions about what should be considered legal (and what data actually appears in reality) to provide rigour to its operations.

Mostly importantly, since "Conceptual Relations" have no geometric representation, they can't otherwise be stored in VectorTiles. But since these Relations can have meaningful metadata, we've elected to disseminate this across any child elements that they reference. To do so in a sensible way, we detect any Relation graphs that may exist, break them into topologically sorted Trees, and disseminate downward.

Note: While the current code does do the collection/graphing/treeing/breaking down, the metadata itself is left as ParSeq[(Long, Seq[ElementData])] and not yet disseminated to the elements referenced by each Long. I feel more discussion is necessary (a second pair of eyes) before I attempt that.

fosskers commented 7 years ago

6 has addressed a lot of this.

fosskers commented 7 years ago

This won't be closed until it is debugged further.