Importing and interpolating 2D data

Project-OSRM / osrm-backend

Open Source Routing Machine - C++ backend

http://map.project-osrm.org

BSD 2-Clause "Simplified" License

6.31k stars 3.33k forks source link

Importing and interpolating 2D data #1102

Closed emiltin closed 9 years ago

emiltin commented 10 years ago

It would be great if OSRM would have a generic way to import and interpolate 2D data.

Data could include elevation, polution, amount of forest/parks, or anything else that's relevant when routing.

OSRM would store the data in an efficient internal format, for example as a number of 3D points (2D location + value). Data could be trimmed to only cover areas where there are road network. Raster data could be sampled, while vector or point data could perhaps be used more directly, or maybe resampled/reduced.

The 2D points could then be interpolated efficiently (using some wave function) at any desired point (in which case you get back a single value) or along any line or polyline (in which case you get a list of distance/value pairs which you can further interpolate if you want).

Interpolation would be easily accessible from LUA, both when preprocesssing data, and during (the proposed) postprocessing of computed routes.

As an example, the interpolation could be used to compute elevation penalties when preprocessing data; for each way you process in LUA, you pass the way to the interpolator and get a list of distance/value pairs and use it to compute a "hill slowdown/speedup". You have the flexibility to compute the penalty in LUA how you see fit based on the profle for the way segment.

Later during route queries, once a route has been computed, you can pass the polyline to the interpolator and get back a list of distance/value pairs representing the elevation profile, and pass that along with the route, so the front-end can display it.

You could have several of these data layers, representing different data types. Since the interpolator would be generic and not only for elevation, you could choose to also pass back several profiles to the front-end, for example evalation and polution.

The road network can be viewed as 2.5D. It's mostly flat, but at certain points and areas you have several layers, like bridges or parking houses. A way to handle this would be to have several data layers each datatype, and choose from LUA which one you query for each part of the route.

Since data is represented as 2D points, not much extra space would be needed. For example, consider elevation. Most of the data is at ground level, but bridges are cross on top of that, and at these points you have two elevations for the same point.

You would have several layers: "elevation_layer0" would cover all of the normal road network. "elevation_layer1" would cover only ways tagged with layer=1, and would therefore be tiny (since it only has 2D points where there are bridges). "elevation_layer2" would be even smaller, etc. When you preprocess in LUA you look at the layer tags of the way and interpolate from the relevant data layer. The result would be that when you go under the bridge you get one elevation, and when you go on the bridge, you get another.

TheMarex commented 9 years ago

If we add elevation data support to OSRM, it should be in form of something similar. Goals for this are:

Make it easy to include raster-data as information source for edge weight calculation
Don't add performance penalties if it is not used

What I'm currently thinking is:

add an (optional) edge_weight_function( .. ) to the lua profiles that is called then the final edge weight is computed
expose C++ functions for reading and sampling (nearest-neighbor, bilinear) of raster data

Having an additional call to lua for each edge has some performance penalties, but should be less than on parsing, as we don't need to copy that much data.

emiltin commented 9 years ago

i would much prefer to keep impedance computation as one step, rather than introduce a second post processing adjustments. there is often interplay between tags and data like elevation, polution; for example for a foot profile, elevation would be irrelevant for escalator. for a bike profile as use case if a profile that preferes green areas like parks. but this might be negated if you're on a big road, etc.

i would just provide some easy way to load and interpret the data from the normal lua way and note functions. this ensures maximum flexibility, and if you don't use the data interpolation, you don't add any extra overhead.

TheMarex commented 9 years ago

i would just provide some easy way to load and interpret the data from the normal lua way and note functions. this ensures maximum flexibility, and if you don't use the data interpolation, you don't add any extra overhead.

To add a little context here for the interested reader, parsing currently works like this: For nodes:

libosmium encounters a node
node_function in the lua profile is called
the result (id, location, if it is a barrier or traffic light) is saved to a vector

For ways:

libosmium encounters a way
way_function in the lua profile is called
the result (node references, way speed, travel mode and accessibility) is split into small segments: the edges

Note that this happens in parallel. At the time of parsing a way we can't even tell if it references a non-existent node.

In that context consider that calling the interpolation inside the way_function does not work: We don't have the node coordinates to do interpolation.

You could solve this by parsing all the nodes first, before you start processing the ways. However we can't keep all parsed nodes in memory. That is why we need to use data structure that saves nodes to disk (all the stxxl::vector stuff). In order to access the nodes while parsing the ways we need random access on this vector. This is however horribly slow since it does disk access. If we want to do this efficiently we need to access the nodes in order. We can do this because we split the way into edges. That way we can run through the node array and assign the correct coordinates to all edges that start with the given node id and in a second run do the same for the second node id. (algorithmically you can do this the same way, as you would when merging two sorted sequences)

Now consider calling the interpolation from the node_function. In this case we have the coordinate so we can actually get a value. So we store the value we computed with the node. However the same problem as above applies: We don't know about the nodes when calling the way_function. There is no way we can utilize the data we just computed. The first time we can use it, is when we bring ways and nodes together. But at that time the ways are already split into separate edges.

In conclusion: If we want to expose this to the lua profile we can't do it over any of the existing functions, we need a third function.

The question is: What data does the third function need be able to work efficiently? IMHO it would suffice if we would also expose the travel mode. In that case you can set the travel mode to a specific value for a way that you don't want to add additional penalties on.

emiltin commented 9 years ago

the way functions does not have to rely on any data computered in the node function. actually i'm not even sure the node function needs interpolation.

TheMarex commented 9 years ago

the way functions does not have to rely on any data computered in the node function. actually i'm not even sure the node function needs interpolation.

Hm, okay. Then I don't understand your approach. How would an interpolation call from the way function look?

emiltin commented 9 years ago

i was thinking that from the way_function you can interpolate data values for the nodes, as long as you just have the coordinates. eg lua has access to an c++ method like:

float interpolate_data(string layer, float lat, float lon);

if you consider elevation, i'm not sure there's much point in interpolation data for a single node. what's interesting is the inclination of a way, ie between nodes. so for elevation i don't think you would need interpolation from the node_function. (but of course you could as well provide access to the same interpolation methods if you have the coordinate.)

as an example, in the way_function, you might call interpolate_data() for the start and end points, then compute the difference to get the inclination, then use that in combination with relevant tags to decide the final impedance.

however, if the way_function is only called for each way, not for each segment you would not be able to set different weights for each segment, which probably would not work, since a long way can have hills midway. in that case i agree that somehow it most be possible to process each segment, either by calling way_function for each segment, or with a new function (segment_function?) maybe segment_functions could be called before way_function so way_function has access to the results?

also, if multiple ways share nodes you might end up interpolating the same point more than once. if that's a performance issue, some sort of caching might aleviate this.

my main point is that if processing is split into separate steps that cannot share data, you loose flexibility in how you can decide the final edge weights.

TheMarex commented 9 years ago

as an example, in the way_function, you might call interpolate_data() for the start and end points, then compute the difference to get the inclination, then use that in combination with relevant tags to decide the final impedance.

The problem is that we don't know the coordinates for the start and end points when we call the way_function. We only know the node ids. Solving this would require making parsing serial: First all nodes, then all relations and at least all ways. (Parsing changed significantly after the libosmium merge, this probably used to be different)

my main point is that if processing is split into separate steps that cannot share data, you loose flexibility in how you can decide the final edge weights.

Yes I agree. It is the less flexible option. We can allow for limited data sharing:

Travel mode
Current edge weight

Which of course is not much, but would at least make stuff like turning height penalties on/off based on travel mode.

So my proposal:

Introduce a new function to the profiles segment_function(start, target, mode, weight)
Add a new function to lua:
- get_raster_data(source_id, lat, lon) which returns the value at the nearest position
- get_raster_interpolate(source_id, lat, lon) which returns a bilinear interpolated value
- load_raster_data(filename) which returns the source_id (or -1 if not found)
Call the segment_function from ExtractionContainers

@lbud would you like to jump on this?

First step would be to:

Implement the utility functions in C++ (you need to figure out where to put the loaded sources)
Add unit tests for them (can be found in unit_tests/) (build using make tests)
Bind them to the lua_state (check out extractor/scripting_environment.cpp)
Add a call to the segment_function in ExtractionContainers
Extent the bicycle profile to load SRTM data

emiltin commented 9 years ago

would performance be affected significantly if we read nodes first? we might be able to increase parallelization of node parsing if we're not reading ways at the same time?

TheMarex commented 9 years ago

would performance be affected significantly if we read nodes first?

The problem is that afaik there are no guarantees of the order of elements inside a PBF and even inside a single PBF block. So either we read the file twice or we cache the parsed ways in memory and call the lua function later. The problem is that caching all the ways needs a significant amount of memory since we need to store all the tag information. In short: Yes it will be significant slower. We currently already have perfect parallelization.

@lbud to capture some of the concerns in chat:

For now assume the most simple case:
- data has to fit in main memory
- data is encoded in the most simple format possible (so no GDAL and similar is required):
- Each pixel is a 16bit signed short
- First 8 bytes (two 32bit unsigned values) encode the dimensions of the image (in pixels)
- Elements are stored row-wise. That is the entry m[i][j] is stores at position width * i + j
we should use load_raster_data(filename, min_lat, min_lon, max_lat, max_lon) to specify the bounding box of the raster source
As interpolation you can use bilinear interpolation

joto commented 9 years ago

I don't know all the details of OSRM processing, just commenting here on the problem of getting the node locations from the nodes and putting them in the ways:

Reading all the nodes from a planet PBF file takes on the order of a few tens of minutes on a fast machine. Storing the node locations in RAM needs currently about 25GB (8 bytes per node). So reading all node locations into RAM first, then reading all the ways and adding the locations to the ways is certainly doable and reasonably fast. This has been the usual way of doing that in many projects and Osmium has all the necessary code to do that.

If you don't have the 25GB spare memory, this could also be done on disk (also supported in Osmium). But this would obviously slow things down considerably. Or it could be done, for instance, on two machines, one reads the nodes and keeps the location cache, adds the location information to ways and streams the ways to a second machine which does the rest of the processing.

In any case, this does not store any other information from the nodes (such as whether they have a traffic light), but only a tiny part of all the nodes contain any interesting tags for routing, so those can be handled differently.

All of this would work with one pass through the file. If multiple passes are needed that isn't a big problem either. Reading a PBF file isn't that fast (I am already working on making it faster), but it isn't that slow either. And Osmium has built-in optimizations for when you are only interested in specific types of objects. So if you only want ways, Osmium will not completely parse the nodes which usually come before the ways in the file, but short-circuit that part of the code. (And if that isn't good enough we can always keep nodes, ways, and relations in separate files.)

gberaudo commented 9 years ago

Some time ago we proposed the elevation patch #1095, which I just closed. This patch was about supporting high precision 3D vector elevation data.

In addition, we also worked on a small project using 2.5D SRTM/OSM data; we implemented the following lua script to request elevation from postgis: https://github.com/camptocamp/provelobern_bicyclerouteplanner/blob/master/routing/profiles/lib/elevation.lua.

It was good enough for us but we would of course enjoy having an official way to achieve this. Which format would be supported by the load_raster_data() mentioned in https://github.com/Project-OSRM/osrm-backend/issues/1102#issuecomment-104407777?

At least for development mode, we often generate (Switzerland) networks on machines with little RAM memory (4GB). I think it would be a valuable bonus if the solution could degrade reasonably on such common machines.

emiltin commented 9 years ago

In the branch I did for parsing relations, the file was parsed twice, and this did not seem to incur a big slow down:

https://github.com/Project-OSRM/osrm-backend/commit/46d19ac8b4a7a387d6e6b965754ebe4a7c6b0d0b#diff-fbab3b79ab0d2d1fc5e48c8d695174e7R177

TheMarex commented 9 years ago

This is now present in develop over the raster source feature in lua profiles.