explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.58k stars 646 forks source link

It takes two hours to evaluate two indicators with 500 data records. Is there any method to accelerate the evaluation? #1238

Open wangsir-cn opened 2 weeks ago

wangsir-cn commented 2 weeks ago

It takes two hours to evaluate two indicators with 500 data records. Is there any method to accelerate the evaluation?

dosubot[bot] commented 2 weeks ago

Hey @wangsir-cn! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

Yes, there is a method to accelerate the evaluation of two indicators with 500 data records. The provided code already uses asynchronous evaluation with the is_async=True parameter in the evaluate function. This should help in speeding up the evaluation process.

Here is the relevant part of the code:

if __name__ == "__main__":
    # asyncio
    print("Starting [Asyncio]")
    start = time.time()
    _ = evaluate(
        eval_dataset,
        metrics=metrics,
        is_async=True,
    )
    print(f"Time taken [Asyncio]: {time.time() - start:.2f}s")

Additionally, you can leverage the Athina platform to further optimize and monitor your evaluations. Athina allows you to run Ragas evaluations programmatically and view the results on its UI. This can provide more granular insights and potentially improve the efficiency of your evaluation process.

To use Athina, follow these steps:

  1. Install Athina's Python SDK:

    pip install athina
  2. Create an account at app.athina.ai and obtain an API key.

  3. Use the following sample code to run your evaluations:

    import os
    from athina.evals import (
        RagasAnswerCorrectness,
        RagasAnswerRelevancy,
        RagasContextRelevancy,
        RagasFaithfulness,
    )
    from athina.loaders import RagasLoader
    from athina.keys import AthinaApiKey, OpenAiApiKey
    from athina.runner.run import EvalRunner
    import pandas as pd
    
    # Set your API keys
    OpenAiApiKey.set_key(os.getenv("OPENAI_API_KEY"))
    AthinaApiKey.set_key(os.getenv("ATHINA_API_KEY"))
    
    # Load your dataset
    dataset = RagasLoader().load_json("raw_data.json")
    
    # Configure the eval suite
    eval_model = "gpt-3.5-turbo"
    eval_suite = [
        RagasAnswerCorrectness(),
        RagasFaithfulness(),
        RagasContextRelevancy(),
        RagasAnswerRelevancy(),
    ]
    
    # Run the evaluation suite
    batch_eval_result = EvalRunner.run_suite(
        evals=eval_suite,
        data=dataset,
        max_parallel_evals=1,  # Adjust this based on your rate limits
    )
    
    pd.DataFrame(batch_eval_result)

By using Athina, you can also configure Ragas to run automatically on your production logs, providing continuous evaluation and monitoring of your model's performance. This can significantly reduce the time required for manual evaluations and provide more timely insights [1][2].

To continue talking to Dosu, mention @dosu.

jjmachan commented 1 week ago

@wangsir-cn were you able to figure it out? we do use async to make it as fast as possible but the biggest time sink is the LLM. Which one are you using?