Open cfortuner opened 1 year ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated |
---|---|---|---|---|
docs-promptable | ❌ Failed (Inspect) | Feb 14, 2023 at 7:26PM (UTC) |
Another option for retry logic here that is probably more performant / better supported is the async-retry
library.
Here's an example implementation from my codebase, wrapping the openai SDK:
import retry from "async-retry";
export const openaiCompletion = trace("openaiCompletion", _openaiCompletion);
async function _openaiCompletion(prompt: string, model: string = "text-davinci-003", temperature: number = 1, nTokens: number = 500): Promise<string> {
const response = await retry(
async (bail) => {
return openai.createCompletion({
model: model,
prompt,
temperature: temperature,
max_tokens: nTokens,
top_p: 1,
frequency_penalty: 0,
presence_penalty: 0
})
},
{
retries: 8,
factor: 4,
minTimeout: 1000,
// onRetry: (error: any) => console.log(error)
}
)
const text = response.data.choices[0].text
return text!
}
Thoughts:
I would probably opt to add this at the base ModelProvider level, and I think it's worth considering implementing this as a function decorator s.t. the retry logic can be added without too much additional boilerplate. That said, thinking deeply about retry logic on an api-specific basis is a real good idea because what is good for OpenAI APIs might not hold for other service providers.
Just updating here. Holding off on adding this for now,
we have some other ideas that we'd like to try that might be better.
Cool @cfortuner lmk if you want me to review the solution when you have a PR.
Having read @mathisobadia's comments, I think that makes more sense. There is a trade-off here:
One more thing to consider is how to handle embedding requests that take more than 250k tokens per minute. If we do batching, we have to construct the array to remain below that. But in one by one, as one embedding request cannot exceed model max token length, we are safe. Even if the total of those exceed 250k/m, the retry policies would handle that.
So I'd say reducing the time needed to process embeddings and being able to handle large text processing is more important (at least for my case). But maybe we can have processing policies to handle both.
Adds a new utility -> Retry!
It's a typescript decorator that lets you retry N number of times:
Let me know what you think!