Introduce request timeout for the osrm-routed server

gardster commented 7 years ago

Some plugins (like match) can take a lot of time to process requests. For some combinations it can lead to very long processing time (even if connection will be lost, server will calculate request). I suggest to introduce configurable request timeout to cancel long running requests.

kostadin24 commented 7 years ago

You have to maintain this dymanicaly. Time to calc matrix with 100x100 is small, but time for 2000x2000 is 30-40 times longer. So timeout limit can be hard value or soft calculated keeping in mind ammount of work.

kostadin24 commented 7 years ago

You can split work to several requests or yo really have to control timeout.

habibutsu commented 7 years ago

@kostadin24 You suggest to estimate timeout before request, but It's not very clear how. From your example 30-40 times longer whether will in allowed timeout range or not?

For example i have several heavy request and several easy requests, due to i can not to specify timeout, the heavy requests can upload server and in result i may get not responsive system. With other side if i can to specify timeouts i will get errors for heavy requests, but easy requests will serve normally.

I think those parameter needed for all api requests, but by default it can be unlimited.

TheMarex commented 7 years ago

I looked into this a while ago and it is a bit of a headache to implement and has architectural implications that are nasty.

If you want to set a timeout for every request, there needs to be a thread to keep track of all the timeouts and somehow communicate with the worker thread of each request. However libosrm does not run it's own thread-pool, we simply offer a thread-safe API.

Even if we go the route of setting up a dedicated timeout tracking thread, we would need to ensure that every worker thread periodically checks in to see if it was preempted. This adds additional overhead and would basically need to hook into every hot loop of the routing algorithms to even have a chance of hitting these preemption points periodically.

And finally even if we would go through all this trouble we are still going to shoot ourselves in the foot under load: Since we can only measure the time between request arrival and now, and not the actual CPU time spend on computing it, that could mean on an overloaded system you are going to end up with a lot of half-computed requests that just all timeout (no guarantee of progress).

However in a open production system you will need to deal with this problem and I can share with you want we do in practice and works well:

Set limits on the API in such a way that you can predict the maximum time it takes to compute any request.
Implement a rate limit that ensures that even when a single user floods you with worst-case requests you don't go over capacity.
Implement a scaling policy that can bring up more capacity on sustained high load.

These are quite generic but for OSRM (1) can basically be achieved by limiting the number of coordinates for specific parameter settings. (e.g max 15 coordinates for /route, 50 coordinates for /table etc.).

gardster commented 7 years ago

Thank you for the detailed explanation.

Implement a rate limit that ensures that even when a single user floods you with worst-case requests you don't go over capacity.

I suggest to introduce an additional limitation of a maximum coordinate radius parameter in the match plugin. #4154 Because this parameter is strongly coupled with performance. But I don’t know how to do it better: to process track with limited radius values or to return an error if values in the request are higher than a limit.

github-actions[bot] commented 3 weeks ago

This issue seems to be stale. It will be closed in 30 days if no further activity occurs.

Project-OSRM / osrm-backend

Introduce request timeout for the osrm-routed server #4156