Open plaban1981 opened 1 month ago
I found a similar open issue that might help you resolve the TimeoutError
when using the evaluate()
function with Langchain-wrapped models in the ragas
library. The issue is titled Facing error for evaluate() for Langchain instance LLM and Embedding models and is still open [1].
The suggested solution involves setting the timeout
parameter in the RunConfig
and ensuring it is passed to both the LLM and the embeddings. Here is an example of how to do it:
from ragas.run_config import RunConfig
from ragas.llms.base import LangchainLLMWrapper
from ragas.embeddings.base import LangchainEmbeddingsWrapper
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas.metrics import answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision
from ragas import evaluate
from datasets import Dataset
# Create a RunConfig with a specified timeout
run_config = RunConfig(timeout=60) # Set timeout to 60 seconds
# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)
# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2", run_config=run_config)
# Create the dataset
data_samples = {
'question': ['What should I be concerned about when considering an autonomous vehicle exposure?'],
'answer': ['When considering an autonomous vehicle exposure, you should be aware of the rapid development of autonomous vehicle technology.'],
'contexts': [['Source: Casualty-Automotive, Vehicles and Autonomous Vehicles. The rapid development of autonomous vehicle technology is a key concern.']],
}
dataset = Dataset.from_dict(data_samples)
# Use the evaluate function
result = evaluate(dataset, metrics=[answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision], llm=langchain_llm, embeddings=langchain_embeddings)
# Extract the details into a dataframe
results_df = result.to_pandas()
print(results_df)
Additionally, you can enable debugging and logging options to diagnose the TimeoutError
by setting the log_tenacity
parameter in RunConfig
to True
and enabling the RAGAS_DEBUG
environment variable:
import os
os.environ["RAGAS_DEBUG"] = "true"
run_config = RunConfig(timeout=60, log_tenacity=True)
This will provide comprehensive logging and debugging information [1].
To continue talking to Dosu, mention @dosu.
@dosu. Encountered the same error even including the RunConfig with timeout as 60
import os
os.environ["RAGAS_DEBUG"] = "true"
run_config = RunConfig(timeout=60, log_tenacity=True)
# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)
# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2")
result = evaluate( eval_dataset,
metrics= [
answer_correctness,answer_similarity,
answer_relevancy,
context_recall,context_precision
],
llm=langchain_llm,embeddings=langchain_embeddings)
Evaluating: 0%| | 0/5 [00:00<?, ?it/s]Exception raised in Job[1]: TimeoutError()
Evaluating: 20%|██ | 1/5 [03:00<12:00, 180.02s/it]Exception raised in Job[0]: TimeoutError()
Exception raised in Job[2]: TimeoutError()
Exception raised in Job[4]: TimeoutError()
Exception raised in Job[3]: TimeoutError()
Evaluating: 100%|██████████| 5/5 [03:00<00:00, 36.01s/it]
To resolve the TimeoutError
when using the evaluate()
function with ChatOllama
and OllamaEmbeddings
, ensure that the timeout
parameter is correctly passed to both the LLM and the embeddings. It looks like the run_config
is not being passed to the OllamaEmbeddings
in your code. Here is the corrected version:
import os
from ragas.run_config import RunConfig
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas import evaluate
from ragas.metrics import answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision
# Enable debugging
os.environ["RAGAS_DEBUG"] = "true"
# Create a RunConfig with a specified timeout
run_config = RunConfig(timeout=60, log_tenacity=True)
# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)
# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2", run_config=run_config)
# Use the evaluate function
result = evaluate(
eval_dataset,
metrics=[answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision],
llm=langchain_llm,
embeddings=langchain_embeddings,
)
# Extract the details into a dataframe
results_df = result.to_pandas()
print(results_df)
Ensure that both ChatOllama
and OllamaEmbeddings
are correctly configured to respect the RunConfig
settings, including the timeout parameter [1][2].
It looks like we're facing a persistent issue here, @jjmachan, your expertise would be greatly appreciated to help resolve this.
To continue talking to Dosu, mention @dosu.
@jjmachan
Could you please help me. We are trying to evaluate RAG responses using Ollama in our local system. And encounter the above error. The same error also persists when trying to use Groq as well.
Any help will be much appreciated. Thank You
@dosu. / @jjmachan
Any updates on how to resolve the issue.
A maintainer has already been looped in.
To continue talking to Dosu, mention @dosu.
hey @plaban1981 - great to see you again 🙂
currently ollama model are not supported sadly, could you try any of the other model providers?
@jjmachan Thank you. I did try with Groq Models as well. But I encounter the same issue. I would be grateful if you can help me with a code sample where RGAS evaluation can be done using an open source model. Thanks
@jjmachan
https://github.com/mosh98/RAG_With_Models/blob/main/evaluation/RAGAS%20DEMO.ipynb. Please check this notebook they have implemented RAGAS using ChatOllama and were able to produce results.
@plaban1981 thanks a lot for sharing that! did that work for you too?
@jjmachan , Unfortunately no. Can you please help me with any working examples of RAGAS evaluation using open source model.
Thank you, Plaban
Hello @plaban1981 have you found out any way around this?
Or maybe @jjmachan is the Ollama support back on? Working on creating a cookbook, encountered ConnectError for mistral
Evaluating: 100%
2/2 [02:23<00:00, 60.54s/it]
ERROR:ragas.executor:Exception raised in Job[1]: ConnectError(All connection attempts failed)
ERROR:ragas.executor:Exception raised in Job[0]: ConnectError(All connection attempts failed)
{
"answer_relevancy": NaN
}
Hello @plaban1981 have you found out any way around this? Or maybe @jjmachan is the Ollama support back on? Working on creating a cookbook, encountered ConnectError for
mistral
Evaluating: 100% 2/2 [02:23<00:00, 60.54s/it] ERROR:ragas.executor:Exception raised in Job[1]: ConnectError(All connection attempts failed) ERROR:ragas.executor:Exception raised in Job[0]: ConnectError(All connection attempts failed) { "answer_relevancy": NaN }
@Sohammhatre10 The issue is still not resolved
@jjmachan any help will be highly appreciated
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug A clear and concise description of what the bug is.
Ragas version:0.1.20 Python version:3.11.0
Code to Reproduce import nest_asyncio nest_asyncio.apply()
from langchain_community.chat_models import ChatOllama from ragas import evaluate from langchain_community.embeddings import OllamaEmbeddings # langchain_llm = ChatOllama(model="llama3.2") langchain_embeddings = OllamaEmbeddings(model="llama3.2")
Test wheter chat model works
langchain_llm.invoke("How are you ?") # result = evaluate( eval_dataset, metrics= [ answer_correctness,answer_similarity, answer_relevancy, context_recall,context_precision ], llm=langchain_llm,embeddings=langchain_embeddings) #
Extract the details into a dataframe
results_df = result.to_pandas() results_df Error trace
Evaluating: 0%| | 0/50 [00:00<?, ?it/s]Exception raised in Job[26]: TimeoutError() Exception raised in Job[16]: TimeoutError() Evaluating: 2%|▏ | 1/50 [03:00<2:27:00, 180.02s/it]Exception raised in Job[33]: TimeoutError() Exception raised in Job[9]: TimeoutError() Exception raised in Job[34]: TimeoutError() Exception raised in Job[25]: TimeoutError() Exception raised in Job[44]: TimeoutError() Exception raised in Job[10]: TimeoutError() Exception raised in Job[17]: TimeoutError() Exception raised in Job[4]: TimeoutError() Exception raised in Job[18]: TimeoutError() Exception raised in Job[43]: TimeoutError() Exception raised in Job[8]: TimeoutError() Exception raised in Job[35]: TimeoutError() Exception raised in Job[27]: TimeoutError() Exception raised in Job[42]: TimeoutError() Exception raised in Job[36]: TimeoutError() Evaluating: 34%|███▍ | 17/50 [06:00<10:04, 18.32s/it] Exception raised in Job[19]: TimeoutError() Exception raised in Job[46]: TimeoutError() Exception raised in Job[21]: TimeoutError() Exception raised in Job[11]: TimeoutError() Exception raised in Job[29]: TimeoutError() Exception raised in Job[12]: TimeoutError() Exception raised in Job[20]: TimeoutError() Exception raised in Job[37]: TimeoutError() Exception raised in Job[28]: TimeoutError() Exception raised in Job[2]: TimeoutError() Exception raised in Job[38]: TimeoutError() Exception raised in Job[47]: TimeoutError() Exception raised in Job[30]: TimeoutError() Exception raised in Job[0]: TimeoutError() Exception raised in Job[45]: TimeoutError() Exception raised in Job[1]: TimeoutError() Evaluating: 66%|██████▌ | 33/50 [09:00<04:02, 14.24s/it]Exception raised in Job[13]: TimeoutError() Exception raised in Job[39]: TimeoutError() Exception raised in Job[5]: TimeoutError() Exception raised in Job[22]: TimeoutError() Exception raised in Job[31]: TimeoutError() Exception raised in Job[6]: TimeoutError() Exception raised in Job[41]: TimeoutError() Exception raised in Job[14]: TimeoutError() Exception raised in Job[48]: TimeoutError() Exception raised in Job[40]: TimeoutError() Exception raised in Job[23]: TimeoutError() Exception raised in Job[49]: TimeoutError() Exception raised in Job[15]: TimeoutError() Exception raised in Job[32]: TimeoutError() Exception raised in Job[3]: TimeoutError() Exception raised in Job[24]: TimeoutError() Exception raised in Job[7]: TimeoutError() Evaluating: 100%|██████████| 50/50 [12:00<00:00, 14.40s/it] Expected behavior It should return the values for the corresponding Evaluation Metric
Additional context
question | contexts | answer | ground_truth | answer_correctness | answer_similarity | answer_relevancy | context_recall | context_precision -- | -- | -- | -- | -- | -- | -- | -- | -- What should I be concerned about when consider... | [['Source: Casualty-Automotive, Vehicles and A... | When considering an autonomous vehicle exposur... | The rapid development of autonomous vehicle te... | NaN | NaN | NaN | NaN | NaN