Find missing routes in OSM

HarelM commented 7 years ago

We should implement a service that can do the following:

Given a GPS trace the server should be able to return a list of routes that needs to be added to the map.
after adding data to the routes (like type, color etc) the server should be able to commit this change to OSM database using the user credentials. This is a spin-off of #258.

GPS simplification article, this is the easy part: http://www.gpsvisualizer.com/tutorials/track_filters.html

I'm still looking for an algorithm that can find missing edges in a linked graph. I'll post things I though about. Using overpass might help with getting the existing data ,or maybe the OSM API itself.

HarelM commented 7 years ago

Here's my initial idea:

[x] 1. Get data from overpass server of all the highways in the bounding box of the GPS route.
[ ] 2. Translate this data in an adjacent matrix - row and columns are all the nodes and 1 and 0 in the matrix are used to signal if they are connected.
[ ] 3. [Optinal] Condense this matrix by joining neighbor points.
[ ] 4. Simplify GPS trace.
[ ] 5. For every point in the GPS trace find the closest node in the above matrix, if the closest node is with-in a tolerance split the GPS trace at that point.
[ ] 6. Ignore all parts that contain only one node. At this stage we have a list of suspected GPS trace parts and we need to find out which of these parts is already mapped.

I'm still not sure how to do this final part, it may require some sort of routing and comparing the routed route to the GPS trace part and see how different they are in terms of distance etc.

I'll start with 1-6 and see if I can present the results on a map and continue from there...

zstadler commented 7 years ago

I'm not sure I understand the need for the adjacency matrix for finding the candidate missing ways.

IMO, the following pseudo-code seem to provide similar results:

Download the geometry data for all OSM ways with a "highway" tag within a range from the GPS track (or from its bounding box).
Mark (not delete) each GPS track point that is within the tolerance from any of the downloaded OSM ways.
Find "long enough" segments of non-marked track points in the GPS track.

The marked points of the traces can later be used to connect the new way to existing ways.

HarelM commented 7 years ago

Yes, The matrix part might be needed in a later phase to reduce the accuracy in order to get better performance, but we can ignore it for now. The main question here is how do we find these "long enough" segments? This is what i referred to as the final part.

zstadler commented 7 years ago

Step 3 in my proposal finds "long enough" segments. These are the sub-sequences of non-marked nodes in each track that have a length of 100 meters or more.

Find "long enough" segments of non-marked track points in the GPS track
- For each GPS track, start at the first non-marked node "start" and repeat:
  - Find the track sub-sequence "candidate" upto, but not including, the next marked node
  - Save the candidate if its length is at least a 100 meters (or some other value)
  - Find the next non-marked node, if any

HarelM commented 7 years ago

Easy enough, I like it. 100m sounds like a good place to start, we can make it configurable. What do you think should be the tolerance for non-marked nodes? 10m radius? I have an initial code that might be able to do the above (not add it to OSM yet, but at least show the missing tack on the map). I need a candidate for testing, do you have one?

HarelM commented 7 years ago

More fine tuning might be needed... Here is a test on a mapped track - circles are 15 radius, minimal length is 100m. It might be that we need a 30m radius and a 250m minimal length. I'm thinking on using the time on the gpx to determine velocity - thus type of track. I prefer to miss a trails that needs to be mapped instead of thinking we need to map an already existing trail, so I think the thresholds should reflect it, what do you think?

zstadler commented 7 years ago

Let's start with a 50 meter radius and 250 meter length

HarelM commented 7 years ago

I'll improve the algorithm to remove points within a distance to a line instead of distance to points - it will work a lot better I think. let's see if this will remove the above issues before reducing the tolerance.

zstadler commented 7 years ago

The distance to lines (ways) is what I suggested in step 2 above

HarelM commented 7 years ago

Must have missed it... Here is the results on Ilan's track (http://israelhiking.osm.org.il/#/13/33.1670/35.7241?s=VgpfTbpbjn): We are getting closer :-) There's still an effort that needs to be done in gpx simplification, the following image demonstrate the issue: Any thought are welcome, I'm thinking if "leave-one-point-out" might do the trick:

for every point in the gpx file: 1.1. Find the closes line in the gpx recording without this point 1.2 If it falls inside the tolerance remove this point

zstadler commented 7 years ago

I suggest we do not reinvent the wheel. Let's evaluate the existing algorithms first.

One problem with the above approach is that it could remove corners.

Moreover, the ziggzag could be a legitimate record of a serpentine.

HarelM commented 7 years ago

Easier said than done. I'm not sure both algorithms described in the article above will do the job. I need to implement every algorithm in order to be able to test it. I prefer to implement only one algorithm. My main concern is the following case of going some where and returning from that place on the same recording, which I don't know how to solve...

zstadler commented 7 years ago

I think we should look for sub-segments where the candidate track overlaps itself. One option is to use the Douglas-Peucker algorithm and split the candidate track into sub-sections, and see if they overlap using the same criteria as overlapping with existing roads.

Also, please check the distance calculations in the algorithm. Using the site, it seems like the north-most point in the longer candidate is just 8 meters from the adjacent road: 2016-11-25 19_07_05-clipboard 2016-11-25 19_20_40-israel hiking map

As for inventing wheels, you already found an implementation of the Douglas Peucker algorithm on the web. It is possible that you can find the others too...

HarelM commented 7 years ago

About the 8m distance: I'm currently keeping the closest points in both edges of the trace so I can later link them, I didn't find a good reason to remove them just yet. It's not about finding the implementation, it's about writing it, testing it and making sure it works. I prefer to think about the edge cases and try and "fail" the algorithm before writing it so when I sit down and write I'll need to do it once. It might not be possible in this case, but one can only hope :-).

My current thoughts are as follows (which might also help in regular simplification):

Split route as before.
Simplify using D-P the sub-routes.
on a sub-route: for every point split the route into two routes and see which of the above split creates the shortest route.
either continue recursively or stop here, still not sure: Here is the missing routes after split and D-P simplification, note that there's still a "z" in there which I think should not exists.

zstadler commented 7 years ago

Summary of brainstorming on an algorithm for removal of backtracking segments:

Create a pool of ways from all existing OSM ways within a minimal distance distance from the trace's bounding box
Start at the end of the GPS trace and progress towards the start. This gives priority to the way after completing a backtrack rather than the way into a dead-end.
Find the first two consecutive nodes that are not within a minimal distance from any way in the pool and start a new candidate way.
Finish the candidate segment if the next node is within the minimal distance from any way in the pool
Finish the candidate segment if the next node is within the minimal distance from the candidate way and the turn of this node w.r.t. the 2 previous nodes is between 150 and 210 degrees.
Add the finished candidate to the pool if it is longer than a minimal length
Look for the next candidate by going back to step 3.

The tuning parameters in this algorithm are

Minimal distance between the candidate and other ways
Minimal length of a candidate
Angle of turn indicating a potential backtrack

HarelM commented 7 years ago

The above approach does cover some cases I tried, it doesn't cover the following unfortunately: Maybe the algorithm can accumulate directional change over distance so that sharp turns are caught even if they happen in more than one GPS measurement, what do you think? (Blue line is what the algorithm think should be added, red is the original route). Here is the relevant gpx file: 90-deg.gpx.txt

Here is the results on Ilan's track. I still need to combine separate route parts when they are sequentail or something:

HarelM commented 7 years ago

OK, accumulative angle might do the trick if combined with looking for the same line again after the accumulated angle is big enough. Consider the following case:

zstadler commented 7 years ago

There is a track connectivity phase after the candidate selection by the user.

I don't see value in connecting tracks in two steps. It may even lead to undesired behavior by combining a desired segment with an undesired one.

What are the specifics of the "accumulated angle" mechanism?

HarelM commented 7 years ago

Removed the angle part of the algorithm. currently all the tests I made came out very good, I'll upload to the test site so we can try and get the feeling of it.

zstadler commented 7 years ago

Unfortunately, currently I don't have a lot of unmapped trails in my traces. Nevertheless, the analysis has found a few places. All places were off the mapped roads - no false positives!

For immediate improvement:

The resulting candidates are over-simplified - worse than manual tracing...
The current performance is not suited for near-real-time approach. For day-long traces traveling by a vehicle, trace analysis takes a very long time. Perhaps the algorithm takes all ways within the trace's bounding box rather than ways within a distance from the trace itself!?

Edit: moved a bug description to the appropriate issue

A few comments about the UX aspects:

A few small candidates are hard to find in a day-long vehicle trace. Perhaps adding a drop icon at the start of each candidate will help. Later on, this drop can serve as the place to act upon the candidate.
The value of a candidate is hard to assess without the context of the original trace. In a few cases the GPS trace should probably replace an entire existing adjacent way which is a rough estimate based on satellite imagery. I suggest adding the analyzed trace to the map while creating a clear visual difference with the candidate.
The trace list popup does not close after clicking the trace analysis button. It is hard to notice that anything is happening. and the busy line at the top of the map is easily ignored. I suggest that when an analysis button is clicked, replace the contents of the popup with a "Looking for new trails in your trace" message and a "busy" animation. A real progress indicator will be much better, since the current progress bar quickly gets stuck a few millimeters from the end.

HarelM commented 7 years ago

Neither do I, which is why I opened this test site, I hope others will have more un-mapped traces. Please ignore the current UX - this is just for testing, I plan a why better UX, I agree with the points you raised though, I'll try and take them into consideration.

Please provide an over-pass query that can do what you want in term of "near trace" - I'm not sure if it's possible - I think this is the main performance bottleneck as in memory calculation is probably pretty fast. I assumed there will be an issue with performance, my though however is to parse the OSM file and keep the highways in the local elastice-search database (along side the names but in a different index) which might improve the performance significantly - it might cause synchronization issues with the full database but this can be handled with the incremental pbf update (maybe). what do you think?

zstadler commented 7 years ago

The best Overpass query I could find is:

(._;node(around:300,30.415,35.138););
(._;node(around:300,30.415,35.139););
(._;node(around:300,30.415,35.140););
(._;node(around:300,30.416,35.140););
(._;node(around:300,30.416,35.141););
// as many lines as needed in the above format

way[highway](bn) -> .w;
node(w.w) -> .n;
(node._.n;way.w);

out qt;

where the representative coordinates are extracted from the GPX trace by keeping only 3 digits to the right of the decimal point and removing duplicates.

Notes:

For a given highway way, nodes that are more than 300 meters away from a representative trace node will not be included in the search results.
Please keep the 300 meter radius as it compensates for the inaccuracy of the nodes locations.

HarelM commented 7 years ago

Can you try and measure the time difference between two queries? Can you run it on the long trace that was problematic when you tried the API? It might help but I have doubts that a long query such as the above is fast...

zstadler commented 7 years ago

I don't know what query is used now, but the new query responds in seconds on the long example - about 5500 GPS points reduced to about 700 representative points. This includes the time to render the JSON results.

HarelM commented 7 years ago

I don't know how to translate the above code to http url, the following is what I use now:

var boundsString = string.Join(",", southWest.lat, southWest.lng, northEast.lat, northEast.lng);
var address = $"http://overpass-api.de/api/interpreter?data=(way[\"highway\"]({boundsString});>;);out;";

zstadler commented 7 years ago

To find the translation to http:

Goto http://overpass-turbo.eu/
Open the browser's network debugging (F12)
Send the above query

Apparently, it is a POST request to http://api.openstreetmap.fr/oapi/interpreter with the query text as its data.

HarelM commented 7 years ago

GET request is usually not a good candidate to pass data in the body (it's not forbidden but it's not a good practice - thus the POST), and you can't easily translate the above to a sequence of key-value parameter which is the standard usage of GET.

In any case I tested the above two requests: The per point took around 4 seconds on a route I tried and the envelope took around 1 sec. I see no reason to use the per point query...

Moreover, if this is the bottle neck, we need to locally do it and not use overpass, IMO. I'll see if I can create a highways database in ES, it should do the trick I think...

zstadler commented 7 years ago

We need to compare the overall analysis performance, not just the Overpass download time.
I think the number of points in the extracted map is impacting the performance because the complexity of the algorithm is O(n*m) where n and m are the number of points in the trace and in the map extraction.

trace analysis takes a very long time. Perhaps the algorithm takes all ways within the trace's bounding box rather than ways within a distance from the trace itself!?

Moreover, I could not reproduce anything like the above performance difference.

The BBOX query returned 3,138KB containing 34,651 nodes and 1,274 ways in 2036ms
The node query returned 285KB containing 2,560 nodes and 122 ways 2903ms

That's 1 second for reducing more than 90% of the data for an algorithm with quadratic complexity.

For this test I've used the Shachatut-Timna bike trail.
See Shacharut-Timna.zip for the trace, the queries, and their results.

Also, are you saying Is there a technical difficulty to issue a POST request instead of a GET!?

HarelM commented 7 years ago

I think this is the main bottle neck of the algorithm, as the rest is done in memory which is very fast. There is no technical difficulty is using POST, I just explained why they use POST instead of GET. I used a line with 400 points for the query. A second is a lot... I managed to insert all the ways to the ES database, I'll post soon the time it takes...

HarelM commented 7 years ago

It takes about 300ms now with ES to find the same highways. Overall algorithm runs in less than a second now. The drawback is that this highways database needs to be updated the same way the other databases are updated (either once a day which is mediocre or with the incremental update which is ideal). This also makes the snapping response faster (it can use this database instead of overpass) :-)

HarelM commented 7 years ago

I'm using an envelope query in ES, this can probably be optimized using polygon query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-polygon-query.html But I'm not sure more optimization is needed beside moving to ES...

HarelM commented 7 years ago

I moved to test south negev 4. this trace has 12K point. ES returns an answer of 500 highways is less than a second which is good. The algorithm to find self loops is O(n^2) where n in the number of points (which is a lot of distance checking). It takes forever just to split the original GPS trace to self loops, a better performing algorithm should be used, hopefully something that is O(n)...

zstadler commented 7 years ago

The algorithm I have in mind has the following stages:

Split into segments of "far enough" points
Search for loops in each segment, each of them should be relatively short

If stage 2 is performed first on the entire trace, there will certainly be an issue with long traces.

Otherwise, in the case of "South Negev 4" trace, there are very few "far enough", which are very short, so this should not be the performance bottleneck.

If the performance issue is in stage 1, then perhaps the original GPS should be first split into short segments, and each segment is scanned and split independently. Perhaps D-P algorithm can be used to define the split points.

HarelM commented 7 years ago

I'll upload the code to the server tonight after hours and run the ES database update so you'll be able to play with it tomorrow.

zstadler commented 7 years ago

I hope you are referring to the test server...

HarelM commented 7 years ago

I don't have a test server for backend. So I upload the code to the production server and hide the relevant buttons in the client side...

zstadler commented 7 years ago

Yes. I meant from the user perspective, it will only be accessible from the test page.

HarelM commented 7 years ago

Site is updated. the algorithm is now lightning-fast! There are issues with the highways database, I still don't understand the root cause, but is seems that there are highways missing. If you see a place where the algorithm was wrong, try and check the snappings around it - you'll probably see that there's no snapping. I'll look into it sometime this week.

HarelM commented 7 years ago

Found the issue, I assumed all highways are LineString (straight lines) while there are highways that are Polygon (closed loop). Will fix it today hopefully.

HarelM commented 7 years ago

Issue fixed along with a bug in snapping which assumed the same incorrect assumption. The algorithm appears to really work! I'm honestly surprised :-). Still missing (please let me know if I missed something):

[ ] UX - we need to carefully think how we want users to interact with this new feature, there's a lot to discuss here, @valleyofdawn, feel free to join this discussing.
[ ] The ability to add a line to OSM - I don't think this is hard since I already know I can easily interact with the OSM API.
[ ] This is a leftover - velocity and thus track type is calculated on all the GPX file but it can be calculated better on the missing part only, each missing part on its own - thus if someone was driving a 4WD and stopped for a walk on a missing trail, this part can be separately thought of as footway instead of track. in any case, this will only be the default selection and the user will have the ability to change it.

zstadler commented 7 years ago

Here is a initial proposal for the UX:

The results are are shown on the map without adding them to a layer, similar to how search results are currently shown. A non-transparent color is used for increasing visibility.
The original trace is added as a layer for reference. The layer uses a different transparent colour.
Actions on a candidate segment are done though its pin icon, similar to accessing actions for search results.
The icon as an "Add to OSM map" action in addition to the existing actions for a line search result - "Convert to Route", "Remove", and "Show Coordinates".
BTW, is there value in a "Show Coordinates" action for a line result!?
The following fields show above the action buttons:
- Mandatory: Road type drop-down: path, track, etc. Its initial value is determined by the segment average speed. Paved road types are used for speeds over 50 km/h.
- Optional: Colour drop-down, Hebrew name, English name, Bicycle accessibility, Trail operator (ITC/KKL/Name)(סימון שבילים/קק"ל/אחר)
For a non-authenticated session, The "Add to OSM map" action first opens the login dialog which will also encourages non-registered users to register.

Unrelated to the UX: When adding a new segment that connects to an existing segment with the same colour, the new segment is added to the route relation with the corresponding "osmc:symbol"=":white:_stripe" tag. Otherwise, a new local hiking route relation is created for a new colour segment.

zstadler commented 7 years ago

@HarelM Please double check the results of "South Negev 4" together with the original trace.

The North-most segment looks like:

The Eastern part of the recording, just after a U-turn should have also been identified by the algorithm.

Can you also avoid the trace simplification at this stage?

HarelM commented 7 years ago

Updated algorithm, there's still improvements to be made, but it covers this issue (without trace merges which is missing), I think. Regarding UX: You can assume the user is logged in to OSM (for all kind of reasons). The trace reference can be the background, but I don't want to put it in the layer control (I'm not sure I miss understood but I believe we are saying the same thing).

Show coordinates for a line is mainly so the user will be able to have a slight idea where the line is (and It came free when I developed it), I think it also creates some similarity so in case the user needs it he'll know where to look. It might not help much, and we can use this button space in this case for the add to OSM button.

I think the optional fields are overwhelming, colour might be the only relevant one. I want as less as possible choices to present the user so that he won't get discouraged. maybe an advanced button can open the rest of the options for advanced users. I think the pop-up menu from the drop might be too small for this usage and we might need a full modal pop-up, I'll see is better... As a side note regarding the operator - I never know what to put...

zstadler commented 7 years ago

The "Highways" trace shows 2 candidates that seem to overlap with existing highways:

Both snapshots were taken in zoom 16

HarelM commented 7 years ago

Can you send me the GPS file please?

HarelM commented 7 years ago

I'll check the files you sent me, meanwhile, I updated the simplification parameter, and now the path is less simplified, let me know if you think it should be more, less or this is it. I also raised the minimal length of a partial loop path to 30m instead of 15, feel free to play with it as you like.

zstadler commented 7 years ago

Can you also check "South Negev 4" at position 30.2094/34.7336? There seems to be a section that was not discovered

HarelM commented 7 years ago

I guess this line is not long enough - minimal length for adding a line is 200m. I must say I'm having trouble with your recordings, they are extremely difficult to split and work on, they contain multiple self loops and I'm not sure how to handle them. I might need to merge them back first after the initial 1000 points optimization split then maybe to simplify with D-P to reduce the number of points then self loop split and then back to remove duplicate points, not sure...

zstadler commented 7 years ago

Yes, it is less than 200 meters.

These are real traces, and reality sucks...

Is it a performance issue or correctness of the results? I don't see how simplification can help with the correctness.

HarelM commented 7 years ago

Reality do suck... The results are not good enough, I still get self loops in a few occasions, the lines are broken to little pieces that I don't want the user to start going over and decide which to add and which not to add. The performance is still OK, but it seems that I need to improve the algorithm to be able to solve the cases you send me, and this will probably require more processing which will lead to worse performance. I think the main issue here is the fact that the lines are broken to much, I 'll see if I can easily solve this. Another issue I need to fix is self loop within self loop - current algorithm splits the line when hitting the first self loop, if within this loop there's another self loop it won't be fixed - which means the algorithm isn't good enough...

IsraelHikingMap / Site

Find missing routes in OSM #267