Closed daniel-j-h closed 7 years ago
The Inertial Flow Partitioner and the dummy partitioner outlined in https://github.com/Project-OSRM/osrm-backend/issues/3204 will share the same interface - only the implementation will change. Let's spec out how this interface will look like (and answer open questions).
Eventually we need a partition on the edge based graph for the customization to make use of it (the call site is outlined in https://github.com/Project-OSRM/osrm-backend/issues/3552). The Intertial Flow partitioner needs an embedding, which we already have for the node based graph. This and the fact that @MoKob already had some ideas how to transfer a partition on the node based graph to the edge based graph makes it a viable option to partition the node based graph already.
In case we want to make the partitioner a new tool osrm-partition
reading in the node based graph and writing out a partition for the node based graph (or already for the edge based graph?) ticket https://github.com/Project-OSRM/osrm-backend/issues/2055 is to mention here: at the moment we write out the node based graph in osrm-extract
and then read it back in immediately. Writing out the node based graph then still has to happen - reading it back in immediately in osrm-extract
is still suboptimal.
osrm-extract
from osrm-partition
now?osrm-partition
read in the node based graph and return a partition for the node based graph? It's more likely that we want to read in both the node based graph and the edge based graph from osrm-extract
, partition the node based graph, transform the partition to the edge based graph, and write that out as a .partition
file.For the partitioner's interface, let's spec this out in two steps. First, partitioning the node based graph. And second, transforming the node based graph partition to the edge based graph.
We can load the node based graph (which is a NodeBasedDynamicGraph) and partition it with Inertial Flow. The output of this is a vector (keyed by node based node) storing a partition id for each node (the partition id is generated from a 0/1 (left/right) bisection per depth).
For the partitioner interface I can imagine a similar interface to what we had in rtd (check rtd/include/partition
)
vector<uint32_t> InertialFlow(const NodeBasedGraph& graph, const vector<Coordinate>& embedding);
leaving out tuning parameters and implementation specifics such as
On the implementation side rtd just destructed the original graph creating two subgraphs (which was somewhat expensive but worked out for North America and even when we did things in parallel). It's not clear to me if we should do this again here. Also in rtd we tried spacial orders again and again for each subgraph to get the best cut. What we could do instead this time: do n spacial orders on the full node based graph and keep them around. Never do spacial orders again but re-use.
At this point we should have a partition id for each edge based graph node. We still need to combine several partition levels into a partition id usable for the customization (rtd called this annotation).
osrm-partition
or is this something we can do in a separate tool / in osrm-customize
? Should the .partition
file be ready for customize (containing level ids) or be a full partition id map?Please fill in the blanks and discuss! cc @MoKob @TheMarex @danpat @oxidase
To turn a node-based partition into an edge based one, there is essentially one item to consider. I tried modelling this in the image below (don't pay to much attention on meanings of nodes, its just to show how many connections there would be).
In the top left, I've put the node based graph with a very simple partition. If we would simply use that partition, we would end up with more border nodes than we want, as shown in the right image (everything that has the color red would be, in some sense, connected to the border of the partition).
What we would have to do is to introduce artificial nodes in between every two nodes that are border nodes in the node-based-partition. We introduce a intermediary segment of length 0 that we don't snap to.
Doing this, we end up with a paradigm that is very useful for us in implementing fast MLD queries. The in
and the out
node of a partition. Every node that is on the border has exactly one connection leaving it, one connection entering it. If you are at the out
node, you don't look back into the cell
you are coming from, but only forward. The in
node only looks at the cell
it is pointing into. This makes the query really efficient, since we don't have to scan over border vertices, we only have to scan over cells.
For the interface, I'll need some time to think about how we would best integrate this into our existing pipeline. To me it is not clear, yet, how we intend to split the different steps so that we can re-use much of what is happening. So I imagine that we should do a dedicated extract/partitioning step that creates a partitioned edge-based-graph? The partition data could potentially come in useful to guide the CH as well.
To summarise:
What about trying to adapt the inertial flow algorithm directly to the edge-based-graph? Instead of sorting by node coordinates directly, we could try sorting by centeroid of the edge-based-node (e.g. centeroid of several segments). Or as already proposed just translate the from the node-based-graph to the edge-based-graph.
The main motivation here is that we want a clear separation between algorithm independent pre-processing (osrm-extract
) and algorithm dependent pre-processing (osrm-contract
, osrm-partition
, osrm-customize
) with minimal changes to the current architecture and code base. Optimizations are great and definitely needed down the road, but speaking from personal experience here anything that requires non-trivial changes will just end up in a huge refactor. We need to be conscious of what battles we pick here and that means usually going the road of least resistance even if less optimal.
That is why I propose to work directs on the .ebg
file we emit in osrm-extract
. Doing this will keep the edge ids stable, enabling us to use this a drop-in-replacement.
Clarify the strategy for how to combine several depths (bits in the partition id) for a single level. For North America we had a tool which tried a few annotation strategies (combining the ranges 8 16 24 32 into four levels) and returned the best.
Try to find an automated strategy here sounds great, as we works with data sets of widely varying size and we need to account for that. Maybe we can base this on a goal for a given number of vertices per level? I think keeping the number of levels constant might be easier to work with down the road.
Should this be already done in osrm-partition or is this something we can do in a separate tool / in osrm-customize? Should the .partition file be ready for customize (containing level ids) or be a full partition id map?
It would be great if we could make osrm-partition
emit a "template" for all cell matrices with the complete partition information that osrm-customize
only needs to fill in the blanks.
I have a few reasons why I would not separate these processing toolchains:
From our prototype we had the following measurements: 2 thread preprocessing, 90 GB memory requirement, 60000 seconds time, so 16.6 hours
Why using node based:
Added benefits for CH:
Refactor Involved:
In total, this would be a battle that is actually worth picking to me.
I think we need to step back here a little bit and look at this from a high-level view and set some constraints:
osrm-extract
is and will remain algorithm independent. That does not mean it could not output additional data that is only relevant for MLD (e.g. dump the compressed node based graph). When we run osrm-extract
we don't want to know about which algorithm we are going to use. This abstraction is the only reason osrm-extract
exists: Algorithm independent pre-processing.
The algorithm dependent part will be constrained to three tools: osrm-partition
, osrm-customize
and osrm-contract
.
This split is important to retain because our whole processing tool-chain is build around it. Ideally we can ship MLD support without anyone noticing that is is there. Breaking compatibility will set us back for at least three months (thinking back to the last time we did that) as we will need to adapt all downstream users and triage release processes.
The goal of the initial implementation is: Change as little as possible to get MLD into OSRM. That is step one, anything beyond that is something we should absolutely discuss when we actually have something running and a system we can improve incrementally.
With that constraints it means we need to keep the general structure of the edge-based-graph intact because our data indexing that we compute in osrm-extract
is bound to it. Literally any partition scheme that fulfills these constraints is fine: Translating from a node-based-graph to edge-based-graph or directly working on the edge-based-graph is both fine.
This, to me, is just another case of you knowing what should be done, but telling no one in advance. Adding a single parameter to osrm-extract
to generate a partition would not break the chain, at least in my view. But as long as you don't share your overall plan on OSRM and MLD, I have no idea on what should be done.
The basic steps for osrm-partition
are:
.ebg
x
in EBG to pair (u, v)
in NBGx
in the EBG, if u
and v
are in the same partition, assign that partition.x
where u
and v
are in different partitions.Up until 4 should be clear by now, so I'm going to focus on 4-6.
At this point we have a partition of the node based graph, staying in the same example as @MoKob above:
What we need is something to translate from the NBG node ids to EBG nodes, because that is what we want to partition eventually. In order to do that we need to write out more auxiliary information while the EBG is generated to get that mapping.
The best location to capture this is probably here. You will need to write out the following information:
node_u
node_v
You load the edge based graph that is a list of edges and construct a (dynamic) graph for them. The best example on how to do that is in the constructor of GraphContractor.
Now we iterate over all nodes in that graph:
u
, v
in the NBGIn the example above the small dots are the nodes in the EBG overlaid over the NBG for orientation. The nodes in the middle that are colored in red and green are the border vertexes that we need to fix.
In the last step we need to resolve the problem arising from these border nodes we can't assign a partition to. There are at least two methods to fix these.
Just pick one of the partitions. This can be done randomly or by just checking how much border edges we would introduce by picking either partition (== counting the adjacent number of nodes in the EBG that have different colors). This of course has all the problems that @MoKob outline above. However for testing this is trivial to implement and gives you a partition instantly.
Here we need to modify the graph in the way @MoKob suggested above. The important thing here is that we are not allowed to modify the IDs of existing nodes in the EBG (because they are saved in the StaticRTree as entry point) but additive changes are fine:
Culprits for this approach are that we need to keep a marker around that identifies "real" arcs (e.g. non-border arcs). We can mark these edges by assigning them a special data ID (this is what we save for example here in the EBGF). Unpacking can remove these edges easily before annotation.
The base version of this shipped using a simple heuristic that picks the BisectionID
of u
for the forward node and the BisectionID
of v
for the backward node.
Once we have a dummy partitioner (ticketed in https://github.com/Project-OSRM/osrm-backend/issues/3204) we unblock further steps in the pipeline. Eventually we want to replace the dummy partitioner with a proper partitioning approach: Inertial Flow.
See: http://sommer.jp/roadseparator.htm
We already successfully prototyped a internal routing engine with that - some hints are over at
http://daniel-j-h.github.io/post/selection-algorithms-for-partitioning/
our multiple candidate cuts and parallelization should be doable in osrm, too.
Tasks: