gis-ops / docker-valhalla

This is our flexible Docker repository for the Valhalla routing engine
MIT License
228 stars 69 forks source link

Difference in route response of hosted service vs self #147

Closed Bryze closed 3 months ago

Bryze commented 3 months ago

There is a difference in response that is being served by the hosted valhalla api vs the one that we run via our local docker image.

The following cURL works fine and gives the response as expected

curl --location 'https://valhalla1.openstreetmap.de/route' \
--header 'Content-Type: application/json' \
--data '{
    "locations": [
        {
             "lat": 28.449684,
            "lon": 77.095215
        },
        {
            "lat": 28.4254605,
            "lon": 77.0929267
        }
    ],
    "costing": "auto"
}'

But after replacing the endpoint with our local docker container, the same coordinates seem to give an error. cURL:

curl --location 'http://localhost:8002/route' \
--header 'Content-Type: application/json' \
--data '{
    "locations": [
        {
             "lat": 28.449684,
            "lon": 77.095215
        },
        {
            "lat": 28.4254605,
            "lon": 77.0929267
        }
    ],
    "costing": "auto"
}'

response:

{
    "error_code": 499,
    "error": "Unknown: Could not find candidate edge used for destination label",
    "status_code": 400,
    "status": "Bad Request"
}

Note: The request isn't always failing. It works for few combinations as well.

It feels like an OSM data issue, but not sure since i have downloaded and built it using the steps mentioned in the docs.

mkdir custom_files
wget -O custom_files/india-latest.osm.pbf https://download.geofabrik.de/asia/india-latest.osm.pbf
docker run -dt --name valhalla_gis-ops -p 8002:8002 -v $PWD/custom_files:/custom_files ghcr.io/gis-ops/docker-valhalla/valhalla:latest

I have tried downloading and replacing it manually as well but to no avail

Could anyone point out what's wrong / missing here?

nilsnolde commented 3 months ago

"error": "Unknown: Could not find candidate edge used for destination label",

hm, I have a clue how this could possibly happen. I could imagine that the reverse search only has a single edge in its expansion when the connection is found (that edge would be the initially correlated one), which we then scrub while stitching together the final path (to avoid duplication with forward search). then when asking for the percent_along of the destination's correlated edge, we can't find it because that edge is not part of the path anymore.

it's the only way I can see how this could happen. it should be a very rare case though. we don't use the bidir a for any trivial routes or routes over connected edges. it seems it's only possible when the destination's correlated edge is super long and the origin's correlated edge short and connected to short edges in which case it'd find the destination edge before the reverse search had a chance to settle any other edges than its correlated one. the bidir a chooses min cost of both trees to decide which one to expand next, and I think in the case I described it'd expand forward quite a few more times before ever starting the reverse search.

btw what do you mean with

Note: The request isn't always failing. It works for few combinations as well.

those exact coordinates aren't always failing? that'd be more troublesome and hard to imagine a reason for it.. or you're saying you can replicate the same error with many other coordinates as well, but still a few coordinate combinations are working fine? or is it really only this one example which fails in this way? that'd be my hope.

anyways, this looks easily testable. just needs someone to do it. I'm busy with other stuff atm. I'd recommend to open this in upstream valhalla and best link to here.

nilsnolde commented 3 months ago

oh sorry, probably my whole analysis above doesn't apply to your problem (though it still sounds a potential bug what I described), as you said the public instance works fine. that code is pretty much the same. hmpf.. I just can't see how that's a data problem. even in any way corrupted tiles can't lead to this error AFAICT..

Bryze commented 3 months ago

r you're saying you can replicate the same error with many other coordinates as well, but still a few coordinate combinations are working fine?

Well, here's where it gets interesting, if i swap the source and destination coordinates example, it's able to give me a route cURL:

curl --location 'http://localhost:8002/route' \
--header 'Content-Type: application/json' \
--data '{
    "locations": [
        {
            "lat": 28.4254605,
            "lon": 77.0929267
        },
                               {
             "lat": 28.449684,
            "lon": 77.095215
        }
    ],
    "costing": "auto"
}'

the response comes out to be

{
    "trip": {
       ....
        "summary": {
            "has_time_restrictions": false,
            "has_toll": false,
            "has_highway": false,
            "has_ferry": false,
            "min_lat": 28.430457,
            "min_lon": 76.996988,
            "max_lat": 28.449673,
            "max_lon": 77.006098,
            "time": 242.666,
            "length": 2.823,
            "cost": 435.869
        },
        "status_message": "Found route between points",
        "status": 0,
        "units": "kilometers",
        "language": "en-US"
    }
}

and trying the same request with the hosted service, cURL

curl --location 'https://valhalla1.openstreetmap.de/route' \
--header 'Content-Type: application/json' \
--data '{
    "locations": [
        {
            "lat": 28.4254605,
            "lon": 77.0929267
        },
                               {
             "lat": 28.449684,
            "lon": 77.095215
        }
    ],
    "costing": "auto"
}'

generates the following

{
    "trip": {
     ...
        "summary": {
            "has_time_restrictions": false,
            "has_toll": false,
            "has_highway": false,
            "has_ferry": false,
            "min_lat": 28.424484,
            "min_lon": 77.092924,
            "max_lat": 28.449894,
            "max_lon": 77.105144,
            "time": 529.657,
            "length": 4.601,
            "cost": 1946.674
        },
        "status_message": "Found route between points",
        "status": 0,
        "units": "kilometers",
        "language": "en-US"
    }
}

If we see, there's a difference of length, 2.8kms in the local version vs 4.6kms in the hosted one (which is correct)

I just can't see how that's a data problem

The variance in the result made me wonder, could it be that the nodes/edges are missing from the geofabrik file?

anyways, this looks easily testable. just needs someone to do it. I'm busy with other stuff atm. I'd recommend to open this in upstream valhalla and best link to here.

Will do so, but your thoughts around it?