Closed capgeminichristianerling closed 5 months ago
In high load scenarios (e.g. our GPUs are busy and there is a bit of a queue build up) we prefer to error out with a busy, rather than let the user wait for a long time (long time meaning roughly 1~2min). This is because in some use-cases the answer might be no longer meaningful to the user after waiting so long.
You can ignore the error and retry immediately if the answer would still be meaningful after the wait.
We'll try to avoid this scenario in the future, more by adding capacity, improving inference speed and smarter resource allocation. There is little do we can do in terms of this client.
ping @ahartel
ping @wolfgangihloff
Hi @capgeminichristianerling, thanks for the ticket. Due to a bug that was introduced yesterday we had some outages in the available of the model that you mentioned. The bug caused a lower throughput of that model which then led to the queues in our system filling up and you getting that error. Operations should normalize over the course of today. For future inquiries of this kind, please feel free to reach out to support@aleph-alpha.com
Issue: We are getting the response from the server when we do a CompletionRequest, using the model:
luminous-nextgen-66b-global-step140000-adapter-control-v1
Response: (503, '{"error":"Sorry we had to reject your request because we could not guarantee to finish it in a reasonable timeframe. This specific model is very busy at this moment. Try again later or use another model.","code":"QUEUE_FULL"}')
Using the Prompt: