periodically save current progress and restart from last save

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

https://docs.ragas.io

Apache License 2.0

6.92k stars 694 forks source link

periodically save current progress and restart from last save #1522

Open trevorbowen opened 1 day ago

trevorbowen commented 1 day ago

Both the evaluation and testset generation are computationally and monetarily expensive.

It is very painful to experience a crash after several expensive hours of evaluation or generation, only to have nothing to show for it. If a project's budget is spent on a long run, and it crashes near the end, it may be impossible for the project to conclude without shifting to a different platform.

Could you please add optional, configurable support (ideally, enabled by default) to periodically save current progress to disk, and resume progress from saved state? After using this tool for a while, several objects seem like they could be saved to disk periodically and restored, if they exist:

Embedding vectors
Vectorized document stores
Generated questions
Evaluated questions

jjmachan commented 1 day ago

@trevorbowen I'm sorry for the hard time and this is something we will improve

but today you do have a few options raise_exceptions=False which is the default should keep catch all exceptions while running evaluations or testset generation - if you find any please do share here (for v0.2+, if using v0.1 do migrate - https://docs.ragas.io/en/stable/howtos/migrations/migrate_from_v01_to_v02/)

You can also save testeset generation too. The process is divided into 2 steps

generate the transforms - which will produce a knowledge graph you can save
Do the generations

so the 3 points you mentioned that needs to be stored can already be stored - we will improve the docs so that it's more apparent.

if you have some specific examples do share them too, we would love to build out more options to make it easier to run these exps for you. I hope this helps 🙂

trevorbowen commented 1 day ago

I am trying to generate a test-set of 200 questions, covering about 100 documents. Currently, this will take 3-days because of rate limitations with OpenAI. If anything glitches during that time period, I have zero questions and must start all over.

I am asking for the generation process to be enhanced to periodically dump its state, so it can resume if I lose power, internet, unexpected program error, etc.

I don't think this is possible today without enhancements to the code.

jjmachan commented 1 hour ago

@trevorbowen actually there are ways we can work around this with the existing tools. I would love to go over them with you, to see if it helps

any chance me might be able to get on a call?

trevorbowen commented 59 minutes ago

Sure, @jjmachan. I will email you.

jjmachan commented 52 minutes ago

awesome 🙂, if we figure it out we can add it back into the docs too

trevorbowen commented 7 minutes ago

@jjmachan, email sent. If you don't receive it, please let me know. ... Thanks!