a-b-street / osm2streets

Convert OSM to street networks with detailed geometry
https://a-b-street.github.io/osm2streets
Apache License 2.0
96 stars 8 forks source link

Simplify the early stages of creating a `StreetNetwork` from OSM data #127

Open BudgieInWA opened 1 year ago

BudgieInWA commented 1 year ago

The start of main osm_to_street_network function is a complicated tangle of stateful modifications of the StreetNetwork with plenty of comments hinting at dependencies between seemingly unrelated steps. The delegation of responsibilities and relationship between StreetNetwork::blank, extract_osm, split_up_roads and clip_map is not clear, with mutations happening to the StreetNetwork all over the place.

https://github.com/a-b-street/osm2streets/blob/3d64f99d400b654367f40d86bee8678ade037e81/streets_reader/src/lib.rs#L102-L115

I think we should improve the public APIs of StreetNetwork and OsmExtract so that they help explain the ordering constraints and delegation of responsibilities. I'll share my ideas about that here, which requires a diversion into clipping and the StreetNetwork to OSM mapping.

Clipping the Map

Clipping is an interesting problem because the interpretation of the map features is influenced by nearby features, that's the whole reason osm2streets needs to exist. I think that StreetNetwork ought to treat its features fundamentally as areas, not centerpoints and centerlines. (That's why I'm advocating for geometry calculations to move up earlier in the process.) If that is the case, then precisely clipping a StreetNetwork has to involve cutting Roads and Intersections in half. This is not something that we should support at the representation level, because the representation still needs to use points and centerlines as reference for the distinct features (intersections and roads) that make up the areas.

In fact, it is an unavoidable problem that we can't accurately represent anything that is close to the edge of the raw data we have. We should acknowledge that fact and build it into our understanding and the API. There should be a buffer border of some distance around the area that we have data for that we consider to be nonsense output. Probably about half the width of a very wide road.

When doing the OSM to Streets conversion, I think we should trim the raw OSM to the boundary polygon in OsmExtract, before adding it into StreetNetwork. The trimming operation is better defined in the domain of the OSM data itself, because the OSM data is points and lines. There should be no clipping operation done on StreetsNetwork representation data, instead we could clip the end-result geometry precisely to the inside edge of the buffer boundary, or output that boundary and let the consumer do it.

Mapping Between StreetNetwork and OSM

A major usecase for StreetNetwork is empowering tools for authoring OSM data, so it will be useful to be able to get from some StreetNetwork feature back to OSM feature that defines it. From our recent discussions, it seems like it will be useful for transformations to have access back to the defining OSM too. That way they can use whatever weird properties of the OSM data they want, without us having to represent it first class within the StreetNetwork features.

The takeaway here is that StreetNetwork should store a full copy of the original OSM data (at least until the user wants to discard it) and maintain a mapping. Doing so makes it easier to reason about adding in additional details at later steps, like use_barrier_nodes and use_crossing_nodes does currently.

The Refactor

Ok, enough waffling about motivations, my proposal is to use private methods and "constructor" methods to delineate the responsibilities, and use public methods to communicate what is safe to do without worrying about internal data dependencies.

  1. Move the concept of a clip boundary into OsmExtract, with the invariant that it only stores data within the boundary. Don't be scared about discarding data just outside boundary, we already have no idea what features we can't see beyond the bounds of the raw data we do have, so lets be clear about our cut-off point.
  2. Implement OsmExtract::new(doc: osm::Document, boundary: Option<Vec<LonLat>>) -> Self which doesn't depend on anything StreetNetwork. It can clip the raw OSM features as it reads them from doc.
  3. Implement StreetNetwork::from_osm(osm: OsmExtract, config: MapConfig) -> Self which does split_up_roads internally, and whatever else is necessary to return a StreetNetwork that is ready to have its public API called.
  4. Make sort_roads and update_movements private again, and work towards all transformations doing their work using the public StreetNetwork API, instead of direct mutation.

Because StreetNetwork::insert_road exists the implication is that it handles some internal data dependency so the caller doesn't have to. split_up_roads should be private to StreetNetwork because it wants to be more efficient than using insert_road and is willing to understand the data dependency in order to do it safely. Transformations should use the public API because they should be non-vital and safe. The only danger of skipping them or calling them in the wrong order should be the missed opportunity for an easy improvement.

The idea is that an interactive editor could use StreetNetwork::new(bounds: GPSBounds, config: MapConfig) to create a blank map for the user, and use insert_intersection and insert_road to build up a map based on user input. Then transformations would be buttons available to "tidy up" the map. They can be clicked in any order as many times as the user likes.

dabreegster commented 1 year ago

Thanks for writing this up! I'm totally in favor.

The trimming operation is better defined in the domain of the OSM data itself, because the OSM data is points and lines.

I really like this idea! I think I've tried to get external tools to do that clipping before, but never gotten exactly the right thing working. Currently we tend to assume something like complete_ways strategy (https://osmcode.org/osmium-tool/manual.html) is passed in as input.

One consequence of rethinking the logic at the OSM level is treating very long roads or railways equivalently. Currently we special case railways that start and end outside the boundary but pass somewhere inside. No reason we shouldn't treat roads the same. (Old traffic sim assumptions leaking through were the original reason.)

The takeaway here is that StreetNetwork should store a full copy of the original OSM data

In https://github.com/a-b-street/osm2streets/tree/osm_tags I'm getting rid of osm_tags on Road. Separately a caller could hold onto a Document, which maps way IDs to Tags. A Road tracks multiple way IDs. From a code structure perspective, does that still seem like what we want to do?

I agree this would very nicely clean up current hacks around crossing_nodes. Right now a very downstream caller outside of this library makes use of that data, but they could do the association with their own road representation themselves. Eventually osm2streets will natively handle crossing and barrier points in some other way.

The idea is that an interactive editor...

That's sort of the intention behind the raw map editor, but in its current form, it's overly complex because of the old way that IDs used to be managed. The refactor you describe will make building this kind of app much, much easier.

BudgieInWA commented 1 year ago

I think I've tried to get external tools to do that clipping before, but never gotten exactly the right thing working. Currently we tend to assume something like complete_ways strategy (osmcode.org/osmium-tool/manual.html) is passed in as input.

Yeah, the complete_ways strategy is how I imaging raw OSM data coming in because that's how the data download in JOSM works. All ways are complete and some stick out of the bounding box. The UI then draws the downloaded area on the map (showing "out of bounds" as yellow hatches), so the user can understand both 1) the extent of the objects they are editing and 2) where there may be unseen features nearby or connected to the loaded objects.

image

Because the OSM data model allows splitting ways wherever the mapper wants to, I think any "out of bounds" data should be ignored consistently, so it would be really cool for OsmExtract to clip strictly to the boundary polygon, truncating ways that cross over the boundary by inserting dummy nodes right at the boundary. It would want to keep areas as closed loops by doing a boolean intersection I guess... and maybe the clipped result is less useful for areas, but I feel like clipping our linear highway features in this way is still the best compromise.

OsmExtract might need to allow multiple clipped objects per id for ways that leave the boundary then come back in, or otherwise represent which sections are outside the boundary.

One consequence of rethinking the logic at the OSM level is treating very long roads or railways equivalently. Currently we special case railways that start and end outside the boundary but pass somewhere inside. No reason we shouldn't treat roads the same. (Old traffic sim assumptions leaking through were the original reason.)

Yes, I'd like to see the clipping be exhaustive at the OSM level, doing proper intersection checks for every line segment, though moving the clipping logic out of StreetNetwork land and into OSM land will be an improvement, even if the implementation is a rudimentary to begin with.

The takeaway here is that StreetNetwork should store a full copy of the original OSM data

In osm_tags I'm getting rid of osm_tags on Road. Separately a caller could hold onto a Document, which maps way IDs to Tags. A Road tracks multiple way IDs. From a code structure perspective, does that still seem like what we want to do?

I like the idea of OsmExtract being the ergonomic collection of raw data that the caller would refer to. Maybe the StreetNetwork (optionally) stores the extract itself and maintains a two-way mapping as it makes edits. That is, Roads and Intersections etc. point back to the OSM IDs and OsmExtract points to StreetNetwork IDs. StreetNetwork would be responsible for keeping this mapping up to date as it modifies its representation. It depends if node_id_to_road_id and node_id_to_intersection_id are useful long term.

The idea is that an interactive editor...

That's sort of the intention behind the raw map editor, but in its current form, it's overly complex because of the old way that IDs used to be managed. The refactor you describe will make building this kind of app much, much easier.

Even before we get there, thinking about how StreetNetwork can encapsulate the complexities behind those sorts of operations is how I make progress on understanding those complexities :D

dabreegster commented 1 year ago

It would want to keep areas as closed loops by doing a boolean intersection I guess

We have some support for that today. Areas get more complex, because they're often relations with some of the members missing in the complete-ways extract. https://github.com/a-b-street/osm2streets/blob/117e8fbda84abca06e6da93cc74593df07f68e99/streets_reader/src/osm_reader/geom.rs#L28 attempts to use the boundary to glue things together. It's used for areas at https://github.com/a-b-street/abstreet/blob/ede907e23971427413b25982ef2c22e7c1309c6a/convert_osm/src/extract.rs#L149. Currently imperfect (https://github.com/a-b-street/abstreet/issues/32) and I think hard to get right. So just focusing on non-area ways for the sake of this bug makes sense to me.

allow multiple clipped objects per id for ways that leave the boundary then come back in, or otherwise represent which sections are outside the boundary.

This is the case that https://github.com/a-b-street/osm2streets/blob/117e8fbda84abca06e6da93cc74593df07f68e99/streets_reader/src/clip.rs#L33 tries to handle. From a traffic sim perspective, some very weird patterns can happen for agents going to/from those map edge intersections. It's actually a connected road in reality, but the connection happens just outside the boundary. Not sure there's a better answer than doing this duplication though.

StreetNetwork would be responsible for keeping this mapping up to date as it modifies its representation

Hmm, maintaining a two-way mapping sounds tougher. Right now each Road and Intersection points back to a list of OSM IDs. We can always go build the reverse mapping (many-to-many I think...) by looking at the whole StreetNetwork. If we wanted to cheaply maintain a two-way mapping, we'd have to decide where to store it and plumb it around. If it lives in StreetNetwork, the borrow checker might fight us if we have an &mut method on Road or Intersection changing IDs. But maybe we never tend to do that in practice.

dabreegster commented 1 year ago

I had a go at starting this idea, because our current haphazard handling of map edge geometry is complicating the geometry refactor. A few problems...

Tests: We don't have an explicit boundary for any of the tests. I think we need to go back and record one, because we otherwise assume a bounding box covering everything in the OSM input. It'd be good to do this anyway, so we can make obnoxious cases like quad_intersection stop stretching super far because of the old behavior about not clipping light rail. StreetExplorer could remember + generate a GeoJSON with the bbox passed to Overpass, and add a button to download it.

Implement OsmExtract::new(doc: osm::Document, boundary: Option<Vec>) -> Self which doesn't depend on anything StreetNetwork. It can clip the raw OSM features as it reads them from doc.

Currently OsmExtract has callback methods like handle_way. This is so a consumer like A/B Street can handle non-road stuff still. This is making me debate where the OSM-level clipping should happen. Maybe handle_way should take the Way, mutate it by clipping if needed, and return Option<Way> to the caller. If None, it was a road and has been handled. If not, the caller can use it, and additionally it's guaranteed to be clipped.

This gets a bit tricky with areas. We already have glue_multipolygon and a few other things in this repo anyway, only used by A/B Street. Maybe we move some of the area clipping logic from A/B Street back here. Eventually osm_reader + Document could be a separate Rust library, totally orthogonal to osm2streets concerns. There are a few OSM-reading crates out there, but I don't think any attempt clipping. Which is maybe kind of reasonable, because possibly the right thing to do depends on semantics deeper than the OSM data model. (A way can represent both an area and a linestring-like thing, and the clipping behavior for each is a bit different.)

I'm leaning towards a different API... after reading a Document, we optionally clip it. Nothing to do with OsmExtract -- it would already receive a clipped document. That avoids the funny handle_way API.

I think the complication of clipping at the document level is introducing geometry that doesn't come from OSM. Way already has both nodes and pts, so I think this'll be OK.

dabreegster commented 1 year ago

Something started in a branch; it's much simpler. Two more questions raised:

1) Is it time to get rid of OriginalRoad? Can we just remember a Road's osm_way_ids? Is it ever useful to track which segments they came from, or could we just stitch this together from the two endpoints osm_node_ids? 2) If we don't provide a clip and infer one from the OSM data, we'll never detect map edges. That's... kind of expected. Should we always require a boundary as input maybe, to simplify the API? I'll browse through all my use cases and see if there's always one readily at hand.

I'll open a few PRs to start pushing this through.

dabreegster commented 1 year ago

I had to back away a bit more from the idea of clipping an OSM document being blind to the types of things being clipped. When loop roads like https://www.openstreetmap.org/way/128464171 are split by a boundary, it was breaking previously. The previous way was also throwing out some geometry for large multipolygon relations that crossed the boundary.

We can revisit generic clipping again, but we need osm2areas or something with tests first. :) A/B Street importer is my proxy for that in the meantime.