Closed 0ENZO closed 3 months ago
that is a great idea @0ENZO ! we use tenacity underneath the hood and have this to configure https://github.com/explodinggradients/ragas/blob/main/src/ragas/run_config.py things like this
I'll add sleep to it and that should help you
we have to add something for stats too I guess. so you can see num_tokens, cost, performance figures etc
what do you think about those, have you felt the need for that. If you were to only choose one, which would it be?
Sounds good, thanks !
Regarding performance figures, num_tokens.. I haven't had any such needs yet
hey @0ENZO so after thinking about it a bit more, it seems like a more complicated solution to implement because of how we have things setup.
The core problem here is the contention of resources, we could have fixed it in 2 ways
ragas
makes and implement something like a leaky bucket so that the #requests per minute is a constantso the solution today is configuring the exponential backoff for 60 requests per minute. Right now I don't have a good formula for that but that is something we could find right?
so the solution for your problem today is configuring the RunConfig
with the correct max_retries and max_wait (and maybe some more, I'll look into that) but what do you think?
also I'm doing some experiments so that I can get you unblocked without much hastle
Do you have a suggestion that I could implement now? I am exceeding my azure gpt4 rate limit of 80k tokens per minute when evaluating 48 questions/answers for all metrics. Is there a way to rate limit the evaluation? Perhaps I should pull out some metrics?
@jjmachan, any suggestions on how we should set the run config? I am also facing this issue with ragas 0.1.7
we have to add something for stats too I guess. so you can see num_tokens, cost, performance figures etc
what do you think about those, have you felt the need for that. If you were to only choose one, which would it be?
May I ask if there is any plan for this part
will be fixed with #1156 for documentation on run_config check out Understand Cost and Usage of Operations | Ragas for how to figure out cost
hope this helps @xiaochaohit @bdeck8317 @klangst-ETR
It would be nice to handle LLM quotas when evaluating a large dataset, for my personal case I cannot increase the default 60 request per min for VertexAI LLM.
Tracking llm calls for the current minute within .evaluate() might sound a bit overkill. Offering the possibility to set a time.sleep() between each sample might do the trick.
I don't what know what you guys think. I am the only one to encounter such a problem ?