Closed samcrawford closed 4 years ago
I've had no further time to look into the code that's causing this, but I plan to debug it further myself tonight.
To satisfy myself that it wasn't just my faulty code that was causing this, I tried to reproduce this against api.openrouteservice.org with a few requests - and I can confirm the same issue is seen there. Sometimes the extras
response element is missing or contains erroneous data.
for i in `seq 1 3`; do
curl 'https://api.openrouteservice.org/pdirections?api_key=KEY_HIDDEN&attributes=detourfactor%7Cpercentage&coordinates=9.437256,52.113252%7C10.447998,47.465236&elevation=true&extra_info=steepness%7Cwaytype%7Csurface&geometry=true&geometry_format=geojson&instructions=true&instructions_format=html&language=en-US&preference=fastest&profile=foot-walking&units=m' -H 'Accept: application/json, text/plain, */*' -H 'Referer: https://maps.openrouteservice.org/directions?n1=49.777717&n2=8.728638&n3=7&a=52.113252,9.437256,47.465236,10.447998&b=0&c=0&k1=en-US&k2=km' -H 'Origin: https://maps.openrouteservice.org' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36' -s -o series_${i}.json
done
List the results...
[sksam@dev1 ors]$ ll | head -4
total 1708
-rw-rw-r-- 1 sksam sksam 592692 Jul 31 12:20 series_1.json
-rw-rw-r-- 1 sksam sksam 558167 Jul 31 12:20 series_2.json
-rw-rw-r-- 1 sksam sksam 592692 Jul 31 12:20 series_3.json
Observe that series_2.json is smaller than the others. Presumably this is because my 2nd API request happened to collide with some other real user running a request.
Now observe that the differences between the files is the contents of the extras
node:
# Good
[sksam@dev1 ors]$ grep -o -E "extra.{120}" series_1.json
extras":{"surface":{"values":[[0,93,3],[93,95,2],[95,131,4],[131,165,3],[165,167,14],[167,174,1],[174,184,0],[184,186,3],[186
# Bad - presumably an exception was thrown whilst building the extras, so it's empty here
[sksam@dcs-dev ors]$ grep -o -E "extra.{120}" series_2.json
extras":{},"bbox":[9.327165,47.4637,10.452246,52.114111]}],"bbox":[9.327165,47.4637,10.452246,52.114111],"info":{"attribution
If anyone has any thoughts in the meantime feel free to share them. Otherwise I will share a fix when I find it.
I've found the cause of this.
'Extra' information is added into responses using ExtraInfoProcessor, which is injected into GraphHopper's normal processing flow in ORSGraphHopper. This injection is done in a bit of a hacky way (as the comments acknowledge) via the TranslationMap.
The issue is that ORSGraphHopper is a singleton, and the TranslationMap hash map is defined at the class level. This is a problem because the following lines are found within calcPaths(...) and operate on a per-request basis (potentially concurrently):
Translation tr = getTranslationMap().getWithFallBack(locale);
...
if(tr instanceof TranslationMap.ORSTranslationHashMapWithExtendedInfo){
((TranslationMap.ORSTranslationHashMapWithExtendedInfo) tr).init(encoder, weighting, request.getPathProcessor());
}
So when you have multiple routes being plotted concurrently, you end up calling init() passing in different encoder, weighting and ExtraInfoProcessor instances to a single shared ORSTranslationHashMapWithExtendedInfo instance. This results in data getting very muddled between simultaneous requests!
I have fixed this in my own local instance by changing:
if(tr instanceof TranslationMap.ORSTranslationHashMapWithExtendedInfo){
((TranslationMap.ORSTranslationHashMapWithExtendedInfo) tr).init(encoder, weighting, request.getPathProcessor());
}
to:
if(tr instanceof TranslationMap.ORSTranslationHashMapWithExtendedInfo){
tr = new TranslationMap.ORSTranslationHashMapWithExtendedInfo(tr.getLocale());
((TranslationMap.ORSTranslationHashMapWithExtendedInfo) tr).init(encoder, weighting, request.getPathProcessor());
}
I have tested this by running multiple concurrent routes (using the same code I put in my original report), and I now see the same correct values and no exceptions being thrown.
I will raise a PR for this shortly, but I suspect there might be a more elegant fix possible.
Thank you very much for reporting this issue and for the thorough investigation! We are currently in the process of upgrading our core to a newer GraphHopper version. As we have restructured the way how the extra-info is passed, we have to re-evaluate how your fix fits in there.
No problem! And looking at the development-v6 branch, I agree that it looks like the changes there would make this change redundant (see https://github.com/GIScience/openrouteservice/blob/development-v6/openrouteservice/src/main/java/heigit/ors/routing/graphhopper/extensions/ORSGraphHopper.java)
Here's what I did
I am writing an application that leverages ORS, and one of the things the application does is calculate a number of possible routes for the user to take (typically ~10). Running these in series is quite slow, but running them in parallel across multiple cores is very quick.
Here is a snippet of the code I'm using to perform route calculation in parallel:
The createRequest function includes this:
In the constructor for ScoredRouteResult we call (vastly simplified code):
It's the retrieval of the extra green info (concurrently) that triggers the issues.
Here's what I got
If I run the code in series (i.e. just a straight for-loop instead of the parallelised code pasted above), everything works fine, and is consistently fine, but takes a while (~6 seconds).
If I run the code in parallel using the ExecutorService (using the code above), it completes a lot quicker (~1 second), but I see the following issues:
Sometimes everything appears to work successfully. The .getDistance() call for each route matches between the parallel version and the serial version - so I'm happy route calculation is parallelisable. However, the getExtraInfo(...) call to retrieve the greenness of the route returns erratic values. Summing the green values for the same route produces different totals each time with the parallelised version.
Sometimes an exception is thrown, which can be one of the following two:
This suggests to me there is some concurrency issue going on with the Extra Info Processor, but after a couple of hours debugging, I can't immediately see where. Hence I'm seeking assistance in tracking down this issue.
Here's what I was expecting
I was expecting the results of the parallel and serial execution to be the same, and I wasn't expecting to receive exceptions.
Here's what I think could be improved
I suspect there is a concurrency issue with the handling of ExtraInfo on routes, and I think we should try to make this threadsafe.