Project-OSRM / osrm-backend

Open Source Routing Machine - C++ backend
http://map.project-osrm.org
BSD 2-Clause "Simplified" License
6.18k stars 3.29k forks source link

Use delta encoding to pack geometry/weights/durations/datasources #5020

Open danpat opened 6 years ago

danpat commented 6 years ago

We have several data structures that we store that represent all the data along a section of road.

These are:

We store groups of these for each node-based-edge. For a long section of curved road, there will be a sequence of nodes, weights, durations and datasources that are very often fetched as a block and iterated over fully (either forward or backward).

There may be some significant space savings to be had by performing some delta encoding on these lists - I'm not sure, but if we're careful with our node numbering, nearby node ids may have very small deltas, allowing for perhaps as little as 4 bytes for the first node, then just 1 byte for subsequent nodes in a sequence.

We could fairly easily re-use some of the encoding tools built into libosmium (via https://github.com/mapbox/protozero) to do this encoding.

As long as we don't need random access (and from memory, we rarely do), the overhead of unpacking these lists should be fairly low, and the savings could be significant.

Because actual coordinates aren't stored in ordered groups like this, I'm not sure the same opportunity exists there.

TheMarex commented 6 years ago

I agree this definitely something we should explore. This could be made even more useful if we don't save duration/weight values for every segment but speed and rate which are unlikely to change a lot (and possibly add a per-segment distance).

One major complication is the way we currently apply speed updates:

If we use delta encoding, the length of every block depends on the data. Updating the duration/weights would mean we constantly need to rebuild the index/move blocks around because the block length changes.

Another problem that would need to be solved is how we construct an index NodeID -> Offset. We currently share the same index for all {forward,backward}_{durations,weights,datasources} (because the number of elements is always the same and each entry has a fixed size), with delta-encoding we would need an index per encoded value (4 bytes * #EBGNodes), which could eat our memory savings.

github-actions[bot] commented 2 days ago

This issue seems to be stale. It will be closed in 30 days if no further activity occurs.