getzep / zep

Zep | The Memory Foundation For Your AI Stack
https://help.getzep.com/ce
Apache License 2.0
2.68k stars 383 forks source link

[FEAT] Configurable OpenAI timeouts and retry settings for compatible APIs #301

Closed esatapedico closed 1 month ago

esatapedico commented 10 months ago

Is your feature request related to a problem? Please describe. I'm using LocalAI as OpenAI-compatible API for self-hosted LLM models. I've configured its endpoint for Zep to use it as OpenAI-compatible API for summarization, intent and entity extraction.

My local server is however not that beefy and requests to it can take up to several minutes to complete. Then when Zep starts calling my API for the tasks at hand, requests start timing out, and then retrying kicks in. Not only responses won't come, as also the API gets overloaded and eventually becomes unusable for a while.

I see that there's retry and timeout configured for OpenAI calls, but they seem to be hardcoded at the time, so I couldn't adapt that to my needs.

Describe the solution you'd like OpenAI timeouts and retries could be configurable through the config file and environment variables, so that the currently hardcoded values can be overriden. Whether that would make sense to have different values for different kinds in requests (summarization, intents, embeddings) I don't know. Maybe it's simpler if it's just one thing.

Describe alternatives you've considered I've turned out intent and entity extraction as an attempt not to overload my API with too many requests in a short period, but unfortunately even a single summarization request can easily take up to a few minutes in my case. For my use-case it's fine if summarization updates take a bit longer, as long as they eventually complete.

Additional context I understand that doesn't make total sense when consuming the predictable OpenAI API, but since we can use compatible APIs, they could come with different performance implications. I'm falling back to the OpenAI API in Zep now because I can't use my self-hosted API for it, although I'm successfully using that in my application code (but then, again, my use-case is really lenient to slow responses).

danielchalef commented 9 months ago

We're refactoring our LLM support with a new release expected late Q1/early Q2. We'll consider making timeouts configurable.

danielchalef commented 1 month ago

This is now supported via a proxy such as LiteLLM.