cfortuner / promptable

Build LLM apps in Typescript/Javascript. 🧑‍💻 🧑‍💻 🧑‍💻 🚀 🚀 🚀
https://docs-promptable.vercel.app
MIT License
1.77k stars 120 forks source link

adding exp backoff retry decorator to openai embedding and completion calls #12

Open cfortuner opened 1 year ago

cfortuner commented 1 year ago

Adds a new utility -> Retry!

It's a typescript decorator that lets you retry N number of times:

  @retry(3)
  async generate(
    promptText: string,
    options: GenerateCompletionOptions = DEFAULT_COMPLETION_OPTIONS
  ) {
    try {
      if (options.s

Let me know what you think!

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-promptable ❌ Failed (Inspect) Feb 14, 2023 at 7:26PM (UTC)
yourbuddyconner commented 1 year ago

Another option for retry logic here that is probably more performant / better supported is the async-retry library.

Here's an example implementation from my codebase, wrapping the openai SDK:

import retry from "async-retry";

export const openaiCompletion = trace("openaiCompletion", _openaiCompletion);
async function _openaiCompletion(prompt: string, model: string = "text-davinci-003", temperature: number = 1, nTokens: number = 500): Promise<string> {
    const response = await retry(
        async (bail) => {
            return openai.createCompletion({
                model: model,
                prompt,
                temperature: temperature,
                max_tokens: nTokens,
                top_p: 1,
                frequency_penalty: 0,
                presence_penalty: 0
            })
        },
        {
            retries: 8,
            factor: 4,
            minTimeout: 1000,
            // onRetry: (error: any) => console.log(error)
        }
    )
    const text = response.data.choices[0].text
    return text!
}

Thoughts:

I would probably opt to add this at the base ModelProvider level, and I think it's worth considering implementing this as a function decorator s.t. the retry logic can be added without too much additional boilerplate. That said, thinking deeply about retry logic on an api-specific basis is a real good idea because what is good for OpenAI APIs might not hold for other service providers.

cfortuner commented 1 year ago

Just updating here. Holding off on adding this for now,

we have some other ideas that we'd like to try that might be better.

yourbuddyconner commented 1 year ago

Cool @cfortuner lmk if you want me to review the solution when you have a PR.

ymansurozer commented 1 year ago

Having read @mathisobadia's comments, I think that makes more sense. There is a trade-off here:

One more thing to consider is how to handle embedding requests that take more than 250k tokens per minute. If we do batching, we have to construct the array to remain below that. But in one by one, as one embedding request cannot exceed model max token length, we are safe. Even if the total of those exceed 250k/m, the retry policies would handle that.

So I'd say reducing the time needed to process embeddings and being able to handle large text processing is more important (at least for my case). But maybe we can have processing policies to handle both.