Project-OSRM / osrm-backend

Open Source Routing Machine - C++ backend
http://map.project-osrm.org
BSD 2-Clause "Simplified" License
6.36k stars 3.37k forks source link

Too many table coordinates error in table service - with 2 coordinates ?? #5454

Closed oberon-oss closed 2 months ago

oberon-oss commented 5 years ago

We have our own OSRM service running. The setup comprises a proxy server and two nodes providing the actual service

We occasionally want to load larger amount of data that we are caching, for example something like https://example,com/osrm/ch/nl/table/v1/car/6.746164,52.121572;6.789958,52.121059.

These requests can use up up to a few thousand coordinates, so we run the osrm-routed service with /usr/local/bin/osrm-routed --shared-memory --dataset-name NL --max-table-size 65536

And that has been working (albeit somehwat timeconsuming), but today we got reports that calls are now being rejected with {"message":"Too many table coordinates","code":"TooBig"}.

It now even fails with short queries, with only TWO coordinates.

The logging does not reveal any details, restarting the datastore and routed osrm programs, so we are a bit at a loss on what is going on.

danpat commented 5 years ago

@FHDumay I can't think of a reason it would fail, as long as you're passing the --max-table-size option.

Things I'd check for:

  1. Make sure requests are actually hitting the osrm-routed process you think they are. osrm-routed runs with the SO_REUSEPORT socket option, which allows multiple osrm-routed processes to listen on the same port number, and the kernel will round-robin connections between them. It's not unheard of to accidentally start up a "testing" osrm-routed process on the same port and have it receive half your requests.
  2. Verify that the request re-writing is working as expected. I see you're adding a path prefix of /osrm/ch/nl - ensure that it's removed before being forwarded to the osrm-routed backends. I assume you've checked this, and I wouldn't expect the TooBig error if this was broken, but it's something to check.
jcoupey commented 5 years ago

My guess is that replacing with --max-table-size 65535 will solve the problem. ;-)

The --max-table-size 65536 value looks suspicious. Probably using 2^16 overflows when the value is parsed at

https://github.com/Project-OSRM/osrm-backend/blob/535647e439f823f693a0c29ed9a8315104b52a0b/src/tools/routed.cpp#L139-L140

So no request size will ever be lower than the max value stored.

oberon-oss commented 5 years ago

@jcoupey @danpat Thanks for the replies. I changed it to 65535, and it works. Curious though why it is set as an int, and it behaves as if it can only handle unsigned short values (int16). I was under the impression that default int values where 32 or 64 bit in C/C++. I looked at the exact same piece of code shown above to see why the value was giving problems. Is this a design decision - i.e. it was never expected (or not supported) to have table sizes exceed the 64k limits ?

Regards, Fabien H. Dumay

danpat commented 5 years ago

@FHDumay I think @jcoupey 's suspicion is on the money, but I have no idea why. I've done plenty of >64k coordinate table requests via the HTTP API and haven't hit this issue. You're right, int should be at least 32 bits on most platforms.

What platform are you running this on?

jcoupey commented 5 years ago

@danpat just did a quick check on my laptop running Ubuntu 18.04 and I'm hitting this issue too, i.e. --max-table-size 65536 overflows, resulting in TooBig responses, while --max-table-size 65535 works just fine.

danpat commented 5 years ago

Well dang, now I doubt my memory.

I think the problem is probably here:

https://github.com/Project-OSRM/osrm-backend/blob/master/src/engine/plugins/table.cpp#L65

I think the multiplication of max_locations_distance_table * max_locations_distance_table is overflowing before the static_cast.

admirabilis commented 5 years ago

Just increasing this value will not increase RAM usage, right? I understand this will happen only when a larger request comes in.

danpat commented 5 years ago

@teresaejunior That's correct - additional memory is only used when a large request is actually made, and the memory used there is released once the request is complete.

A while back I measured the rough memory required for various table requests (https://github.com/Project-OSRM/osrm-backend/issues/5181#issuecomment-417135981). It's about 75MB of additional memory used for 1000x1000, and 2GB used for 5000x5000.

github-actions[bot] commented 3 months ago

This issue seems to be stale. It will be closed in 30 days if no further activity occurs.