Adds token and request-based rate limiting with an example

cfortuner / promptable

Build LLM apps in Typescript/Javascript. 🧑‍💻 🧑‍💻 🧑‍💻 🚀 🚀 🚀

https://docs-promptable.vercel.app

MIT License

1.77k stars 120 forks source link

Adds token and request-based rate limiting with an example #29

Open hanrelan opened 1 year ago

hanrelan commented 1 year ago

The rate limits on OpenAI seem a little wonky. Not sure if that's only true for Codex or for all the models but I was definitely getting rate limited even when using less than half the requests per minute.

But it's better than nothing (maybe?)

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
docs-promptable	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Feb 17, 2023 at 6:32AM (UTC)

mathisobadia commented 1 year ago

I think this is useful but limited. The issue is that in a serverless environment (like nextjs api functions) every api call is made in a different lambda function that has no knowledge of the other lambdas so this will rate limit individual lambda functions but we can still have like 100 lambda functions doing api calls at the same time and getting rate limited. To account for that the only solution is some kind of retry with exponential back off.