Open donbr opened 4 months ago
hey @donbr I'm not sure if I understood the idea but we can add that quite easily to the benchmark. but I was not able to understand the usecase you had in mind.
the tests/benchmark
directory is meant as a test suite for developing, hence the hard coded LLMs. what was the usecase for calling them directly? are you running the benchmarks yourself for different LLMs too?
in that case we can easily add the change you proposed! it would improve it like you said
Describe the Feature Currently, the
tests/benchmarks/benchmark_testsetgen.py
script has hardcoded LLM models forgenerator_llm
,critic_llm
, andembeddings
. However, other Ragas scripts have the ability to override these defaults when called from a Jupyter notebook (etc.).Why is the feature important for you?
Current code in benchmark_testsetgen.py:
Because of the hardcoding in the benchmark_testsetgen.py script it negates the ability to dynamically set global defaults in notebooks / scripts for the generator, critic, and embedding models when calling Ragas:
Additional context Modify the
benchmark_testsetgen.py
script to mirrorsrc/ragas/testset/generator.py
and support optional parameters to allow for consistent overriding of default settings forgenerator_llm
,critic_llm
, andembedding_model
. This will allow users to set global and consistent defaults when calling Ragas from a script / notebook:I was surprised when I was running the Ragas scripts from a notebook that it ignored the GPT-4o settings for the critic_llm, and used the much more expensive and older base GPT-4 model. A number of better variants on the approach above, but this should be sufficient.
The essential requirement is consistency and transparency of models used during specific steps of the process.