explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
6.97k stars 698 forks source link

Add cost tracking to dataset generator and allow dataset size control #1506

Open alexander-zuev opened 6 days ago

alexander-zuev commented 6 days ago

Describe the Feature There are 2 problems with the current dataset generator approach:

Why is the feature important for you?

Additional context Add any other context about the feature you want to share with us.

shahules786 commented 6 days ago

Hey @Twist333d Very valid concerns. 1) We will plan and add support to cost estimation for test generation soon. 2) While efficiently generating ( and regenerating) test data points from a single document set, we also have to make sure that the generated points were not already generated before. So we intend to make this process efficient by doing a one-time preprocessing of documents and then letting you persist with the intermediate form (KG). In that case, one could repeatedly sample data points from the same corpus w/o redoing the preprocessing step. What do you think?

We will be continuously improving the new test gen, and would love to chat and understand more from you. https://cal.com/shahul-ragas/30min

alexander-zuev commented 6 days ago

@shahules786 thanks for a prompt response! I think what you describe does indeed address the two core concerns I had:

On the proposal above, how would a user manage the intermediate form? Would I need to save it / manage it locally?