Open wangsir-cn opened 2 weeks ago
Hey @wangsir-cn! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!
Yes, there is a method to accelerate the evaluation of two indicators with 500 data records. The provided code already uses asynchronous evaluation with the is_async=True
parameter in the evaluate
function. This should help in speeding up the evaluation process.
Here is the relevant part of the code:
if __name__ == "__main__":
# asyncio
print("Starting [Asyncio]")
start = time.time()
_ = evaluate(
eval_dataset,
metrics=metrics,
is_async=True,
)
print(f"Time taken [Asyncio]: {time.time() - start:.2f}s")
Additionally, you can leverage the Athina platform to further optimize and monitor your evaluations. Athina allows you to run Ragas evaluations programmatically and view the results on its UI. This can provide more granular insights and potentially improve the efficiency of your evaluation process.
To use Athina, follow these steps:
Install Athina's Python SDK:
pip install athina
Create an account at app.athina.ai and obtain an API key.
Use the following sample code to run your evaluations:
import os
from athina.evals import (
RagasAnswerCorrectness,
RagasAnswerRelevancy,
RagasContextRelevancy,
RagasFaithfulness,
)
from athina.loaders import RagasLoader
from athina.keys import AthinaApiKey, OpenAiApiKey
from athina.runner.run import EvalRunner
import pandas as pd
# Set your API keys
OpenAiApiKey.set_key(os.getenv("OPENAI_API_KEY"))
AthinaApiKey.set_key(os.getenv("ATHINA_API_KEY"))
# Load your dataset
dataset = RagasLoader().load_json("raw_data.json")
# Configure the eval suite
eval_model = "gpt-3.5-turbo"
eval_suite = [
RagasAnswerCorrectness(),
RagasFaithfulness(),
RagasContextRelevancy(),
RagasAnswerRelevancy(),
]
# Run the evaluation suite
batch_eval_result = EvalRunner.run_suite(
evals=eval_suite,
data=dataset,
max_parallel_evals=1, # Adjust this based on your rate limits
)
pd.DataFrame(batch_eval_result)
By using Athina, you can also configure Ragas to run automatically on your production logs, providing continuous evaluation and monitoring of your model's performance. This can significantly reduce the time required for manual evaluations and provide more timely insights [1][2].
To continue talking to Dosu, mention @dosu.
@wangsir-cn were you able to figure it out? we do use async to make it as fast as possible but the biggest time sink is the LLM. Which one are you using?
It takes two hours to evaluate two indicators with 500 data records. Is there any method to accelerate the evaluation?