Closed thomas-riccardi closed 4 years ago
Task.wait()
is not controllable when called from InferenceResource.inference()
You can already control the timeout by doing:
task = InferenceResource.inference(return_task=True)
task.wait(timeout=123)
data = task['data']
I wanted to avoid such behavior because I would like to have a synchronous API for inference at some point, which would return directly the data
field. But maybe we should make the above behaviour the default for now to better reflect the API ?
Reopening: we still need timeouts at the network level: http://docs.python-requests.org/en/master/user/advanced/#timeouts
Currently retrying is mainly done using tenacity (quite flexible). Though maybe we should combine it with https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#urllib3.util.retry.Retry ? Not sure if requests does some optimization depending on the exception.
https://github.com/App-vNext/Polly/wiki/Transient-fault-handling-and-proactive-resilience-engineering and https://github.com/App-vNext/Polly/wiki/Polly-and-HttpClientFactory describes more advanced patterns:
One higher level sub-issue is the inference()
method currently fails as soon as a 502 error is received for any GET /task
in the internal task.wait()
: it should not: we just don't know the new state, let's continue trying to get an up-to-date state (until succes/error or global wait timeout).
Let's close this issue as I think most things have been fixed in #58. Reopen if needed (or open a new issue)
Regarding retries and timeouts: We should have both sane default behavior and everything controllable if user wants to fine-tune.
We need: