TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

HIDDEN_TERMINUS: unhide #360

Open yakra opened 3 years ago

yakra commented 3 years ago

When encountering a HIDDEN_TERMINUS, strip the leading + and make the point visible.

Python: I should be OK to continue doing this datacheck along with the others after graph gen.

yakra commented 3 years ago

related: #91

yakra commented 1 year ago

On second thought... Datacheck errors can get lost in the noise and go unaddressed for a long time. If these points remain hidden, someone might notice a hidden endpoint in the HB, and report the error in the forum.

yakra commented 1 year ago

On the 3rd hand, unhiding these could prevent ERROR: segment name mismatch in HGEdge collapse constructor. Setting the bool to 0 (#582) and keeping the leading + could allow both this & keeping a hidden endpoint in the HB as noted above.

yakra commented 1 year ago

As it stands now, flipping the Waypoint::is_hidden bool immediately when HIDDEN_TERMINUS is detected would mess up VISIBLE_HIDDEN_COLOC detection later on.

VISIBLE_HIDDEN_COLOC detection could be moved into route_integrity (probably a good idea anyway because multithreading).

Heartburn - duplicated code...

Nonetheless, this is probably still more efficient than waiting until the compression stage and searching for termini there, which would involve analyzing way more points and iterating through their colocation lists until a terminus is found.


On the 4th hand, becoming even more reliant on the is_hidden bool (vs. a function) could get in the way of getting sizeof(Waypoint) down to 96 and being able to fit 2 Waypoints in 3 cache lines, provided I find some way of getting rid of Waypoint::point_num too (edit: which is easy). On the 5th hand, how much time would that save? Most routines iterating & accessing Waypoints are already plenty fast.

Which is better? In any case, all this stuff is pretty low priority for now.

yakra commented 1 year ago

On the 6th hand, some theoretical changes to the HGEdge class would mean storing Waypoint::vertex not as a full pointer, but as a 32-bit index into an array. This means only Waypoint::is_hidden nudges us above an 8-byte alignment boundary. This begs for its removal if keeping a compact sizeof(Waypoint) is a priority. On the 7th hand, losing Waypoint::point_num & effectively replacing it with the 32-bit vertex array index means we're down 8 bytes (for the old HGVertex*) and still have room for the bool.

yakra commented 6 months ago

Yecch. OK, this issue kinda got to be about 2 things...

  1. Whether to keep is_hidden at all, when optimizing for object size. (Those mumblings were probably better suited for #582, but anyway.) My current thinking is, Waypoint will eventually have some unused space due to alignment requirements, so we may as well keep it around.
  2. Whether and how to unhide hidden termini in the graph data. Definitely should do so, to prevent segment name mismatch errors. Some info and a couple early & outdated ideas are here, though I'd like to keep that issue about what the title says. So I'll yammer on about how to unhide them here.

The segment name mismatch error can only occur at colocated points, never singleton. This is because it requires 2 things:

For a terminus to have 2 incident edges, either

Both of which require colocated points. A singleton terminus means no other route to end at, means only 1 incident edge.

yakra commented 6 months ago

How to do it

The goal is to set HGVertex::visibility = 2 and avoid collapsing its incident edges. Most of the time, that's already been done (via a colocated visible point, or <> 2 edges). Short-circuit evaluation makes the hidden terminus check rarely necessary.

There are several alternatives to consider.

These 2 approaches can each be done in 2 places:

Or, the other alternative considered upthread & in #589,

*Numbers represent HighwayData @ 2decc64. Not the most recent commit, but it's what I've been runnign tests on.

Of these alternatives, probably pick the most micro-optimizey one. :)

Before benchmarking:

yakra commented 6 months ago

A few more thoughts on bool Waypoint::hidden_terminus...

Propagating this bool's value to the front of the colocation list means...

This gets tricky when we remember that when the datacheck is performed, the quadtree is not yet fully populated.

Hm. The Waypoint::hidden_terminus path is looking less & less attractive. Maybe the best option will be to go with whatever has the least overhead, and when we get to graph generation, allow short circuit evaluation to work its magic?