Add retry on 504 Gateway Time-out and "connection closed before message completed"

64bit / async-openai

Rust library for OpenAI

https://docs.rs/async-openai

MIT License

1.09k stars 161 forks source link

Add retry on 504 Gateway Time-out and "connection closed before message completed" #198

Closed Christof23 closed 3 months ago

Christof23 commented 5 months ago

Hi, I can see that requests are retried on a 429 too many requests status code. However on other error types there are no retries.

For example for 504 responses the response returns:

ERROR async_openai::error:46: failed deserialization of: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>

Similarly, I sometimes receive Error: IncompleteMessage: connection closed before message completed for some requests which are not retried. I believe this is a hyper issue: https://github.com/hyperium/hyper/issues/2136

Christof23 commented 5 months ago

PR for the 504: https://github.com/64bit/async-openai/pull/199

64bit commented 5 months ago

Hi @Christof23 ,

Thank you for the issue and PR.

What rationale is there to retry when API is down - and could potentially be down for hours until incident is resolved?

64bit commented 5 months ago

Similarly, I sometimes receive Error: IncompleteMessage: connection closed before message completed for some requests which are not retried. I believe this is a hyper issue: https://github.com/hyperium/hyper/issues/2136

Based on the comment, does configuring same parameter for custom reqwestclient fix the issue?

Christof23 commented 5 months ago

Hi @64bit, thanks for your response.

I take your point on the 504 response; however I've seen this response while other concurrent requests have gone through successfully. I'm not sure how OpenAI are configuring their infrastructure, but it seems that a 504 with their API doesn't necessarily mean the API is unavailable.

For the reqwest client, I think I tried the fix with the client configuration, but I'll double check.

64bit commented 5 months ago

Right thing to do here is contact OpenAI and report that their API is failing. Introducing retries is propagating that failure to everyone - and making library in-deterministic.

That said, one thing that could be done in async-openai is propgate the payload causing deserialization error so that library users can decide what right action to perform.

So OpenAIError::JSONDeserialize(serde_json::Error), becomes OpenAIError::JSONDeserialize(serde_json::Error, String), where the String is payload that OpenAI responded with.