Simplify the routing core and data format

abrensch commented 1 year ago

This is a major change I have in mind since some time now, but it's a major step and not clear yet whether this can be a patch developed in a branch, or rather is a follow-up project of BRouter

The issue is that the routing core of BRouter is far more complex then it has to be.

That became obvious after I implemented the "DirectWeaver": All the code that was used before to convert the micro-tiles into an intermediate format and all the cache-management to manage this converted data in Memory, this is all obsolete, but still part of the codebase. With the DirectWeaver, only two representations of the data left: the encoded data as stored in the RD5s, and the actual routing graph. Nothing in between anymore.

Another major part of the complexity is the design-decision to distinguish between "Network-Nodes" and "Transfer-Nodes". 2/3 of the Nodes are "Transfer-Nodes", meaning they are only intermediate points along a way with no foreign connections. The idea behind that design decision was that it saves processing time and memory to have a special, simpler treatment for transfer nodes, not processing them through the priority queue and not encoding the OSM-tagging for every single section along a path of transfer nodes. However, nowadays I question this design decision. It's probably still a good idea to have some shortcuts for transfer nodes, but it's definitely a bad idea to encode this distinction into the data-format. One reason is that, in reality, the answer to the question "what is a network node" depends on the routing profile. For a car profile, not every connecting footpath creates a network node. So if there would be some efficient processing for nodes that are effectively transfer-nodes, letting them bypass the priority queue, that should be enough. For the memory footprint of the routing graph, each transfer node would be represented by one single java object (because OsmNode and OsmLink do instance-sharing via "technical inheritance"), so that will be slightly more compared to the current implementation (where the list of transfer node is encoded in a byte-array), but not double it, so that does not justify the complexity introduced by the concept of transfer nodes.

And there are smaller issues blowing up the code and candidates for removal:

the logical separation of the 5x5 degree RD5's into 25 1x1 degree sections is probably bullshit. There were 2 design considerations behind that: 1) to access a small amount of data, only one (or few) of the 25 micro-tile-indexes has to be read and 2) the data for a small geographc region is better clustered in the physical file, making file access more efficient. Nowadays I think that 1) is irrelevant (at least since I changed from 1/80 degree microtiles to 1/32 degree microtiles...) and 2) can event be improved by just re-ordering the microtiles, without having a seperate logical layer.
- the logic to delay the decision what is the "winning path for a link" one more step is not validated and not proven to have any positive effect. It originated from theoretical considerations that having an "elevation-buffer" makes A*/Djikstra an incorrect algorithm. I think that just can be dropped or replaced by something simpler, perhaps simply an empirical correction to the future cost.
- the traffic-simulation (used for the "old" version of estimated traffic class) also blows up the code and is not used anymore
- the statistical encoder has a 3-pass logic but is needing only 2 passes

All just ideas now and no idea if and when to address that.

I wanted to share that thoughts and point out that BRouter is far from perfect and, given the amount of attention that the project hat, it's time to refactor it's core. Not necessarily by me, maybe someone else feels that this is interesting stuff… Little less then a year ago I already started to refactor the code for the statistical encoder, see https://github.com/abrensch/statcoding That could also be part of a refactoring effort.

afischerdev commented 1 year ago

Great ideas are coming up.

My two cent updates for the near future:

splitting OsmTrack in track and export logic, add something like OsmTrackFormatter
splitting RoutingEngine into engine and routing logic, add something like RoutingAlgorithm
remove 'seed' entry from BRouter main. But reuse logic to find 'green' areas for round trip generation

EssBee59 commented 1 year ago

Arndt, Yes, after N years of work a major change is a good idea. My expertise is not good enough to understand every point, but it looks very good.

I just liked to ask about a performance: Do you expect an enhancement with the change?

Regards

abrensch commented 1 year ago

I just liked to ask about a performance: Do you expect an enhancement with the change?

Performance is not the main objective, but simplicity, readability, maintainability, test-coverage..

Keeping the raw node-crunching-speed (>= 2 Miliion nodes per second per thread) on a similar level is clearly a constraint, but I would not bother about a 10% degradation.

However, there are other fields of performance especially for short distiance routings like caching waypoint-matches or rd5-index-tables that become possible with improved maintainability.

abrensch commented 8 months ago

This is a major change I have in mind since some time now, but it's a major step and not clear yet whether this can be a patch developed in a branch, or rather is a follow-up project of BRouter

Having some holidays I actually started cleaning up the core code and I created a branch "cleanupbranch" for this. Still a log way to go but seems I am still able to touch the code without breaking the unit-tests

I want to base all the encoding-stuff on the new "statcoding" github-repo, but for now I just copied the code into this branch of the brouter-repo just to save the dependency management while I'm working on both parts. Had to switch off checkstyle and PMD to get the statcoding part compiled, will switch back later. However, using Intellij in default config and looking for the warnings that produces.

Using Java-17 but not yet any non-Java-8 features.

But idee is to have the general-purpose code in a separate repo (probably not just the encoding/decoding stuff but also the collection classes)

Data format will be completely new and incompatible to current RD5.

But integration e.g. with BRouter-Web will not change (Not sure about the Android-App yet)

Some aspects will for the first time dissappear because they depend too much on the stuff I am going to delete. True for the RD5-Diff/delta Tools and for the suspect scanner.

But chances are that the cleanup-version will never become the production version of the pipelines we are operating, but just a "clean reference implementation" to build new projects upon.

So stay tuned.

regards, Arndt

abrensch / brouter

Simplify the routing core and data format #628