Project-OSRM / osrm-backend

Open Source Routing Machine - C++ backend
http://map.project-osrm.org
BSD 2-Clause "Simplified" License
6.19k stars 3.29k forks source link

Deduplicate data for every InternalExtractorEdge #3767

Open TheMarex opened 7 years ago

TheMarex commented 7 years ago

Currently split up each way in ExtractorCallbacks::ProcessWay into edges and duplicate all the data that we previously per-way on an per-edge basis. This was fine two years ago when that basically only included three or four values, but this has grown significantly.

There is really no reason why we carry this data round with every edge. It make more sense to split InternalExtractorEdge into two parts:

  1. DirectionalWayData which includes all data we don't need to touch again until the EdgeBasedFactory:
    • is_roundabout
    • is_circular
    • is_startpoint
    • is_restricted
    • travel_mode
    • turn_lane_id
    • road_classification
    • name_id
      • Duplicate if we need a forward/backward version
  2. InternalExtractorEdge:
    • OSM start id
    • OSM target id
      • weight_data,
      • duration_data
      • foward / backward flag

Since we can de-duplicate DirectionalWayData we can significantly cut down the memory usage and disk usage during running osrm-extract. This would reduce major pain points we currently have in our infrastructure around excessive disk usage.

/cc @danpat

TheMarex commented 7 years ago

Removing milestone here. This is a bigger list and would best be addressed in context of a osrm-extract refactor.