explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.18k stars 732 forks source link

Evaluating: 100%|██████████| 50/50 [12:00<00:00, 14.40s/it] eeror encountered whil evaluating using ChatOllama #1364

Open plaban1981 opened 1 month ago

plaban1981 commented 1 month ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug A clear and concise description of what the bug is.

Ragas version:0.1.20 Python version:3.11.0

Code to Reproduce import nest_asyncio nest_asyncio.apply()

from langchain_community.chat_models import ChatOllama from ragas import evaluate from langchain_community.embeddings import OllamaEmbeddings # langchain_llm = ChatOllama(model="llama3.2") langchain_embeddings = OllamaEmbeddings(model="llama3.2")

Test wheter chat model works

langchain_llm.invoke("How are you ?") # result = evaluate( eval_dataset, metrics= [ answer_correctness,answer_similarity, answer_relevancy, context_recall,context_precision ], llm=langchain_llm,embeddings=langchain_embeddings) #

Extract the details into a dataframe

results_df = result.to_pandas() results_df Error trace

Evaluating: 0%| | 0/50 [00:00<?, ?it/s]Exception raised in Job[26]: TimeoutError() Exception raised in Job[16]: TimeoutError() Evaluating: 2%|▏ | 1/50 [03:00<2:27:00, 180.02s/it]Exception raised in Job[33]: TimeoutError() Exception raised in Job[9]: TimeoutError() Exception raised in Job[34]: TimeoutError() Exception raised in Job[25]: TimeoutError() Exception raised in Job[44]: TimeoutError() Exception raised in Job[10]: TimeoutError() Exception raised in Job[17]: TimeoutError() Exception raised in Job[4]: TimeoutError() Exception raised in Job[18]: TimeoutError() Exception raised in Job[43]: TimeoutError() Exception raised in Job[8]: TimeoutError() Exception raised in Job[35]: TimeoutError() Exception raised in Job[27]: TimeoutError() Exception raised in Job[42]: TimeoutError() Exception raised in Job[36]: TimeoutError() Evaluating: 34%|███▍ | 17/50 [06:00<10:04, 18.32s/it] Exception raised in Job[19]: TimeoutError() Exception raised in Job[46]: TimeoutError() Exception raised in Job[21]: TimeoutError() Exception raised in Job[11]: TimeoutError() Exception raised in Job[29]: TimeoutError() Exception raised in Job[12]: TimeoutError() Exception raised in Job[20]: TimeoutError() Exception raised in Job[37]: TimeoutError() Exception raised in Job[28]: TimeoutError() Exception raised in Job[2]: TimeoutError() Exception raised in Job[38]: TimeoutError() Exception raised in Job[47]: TimeoutError() Exception raised in Job[30]: TimeoutError() Exception raised in Job[0]: TimeoutError() Exception raised in Job[45]: TimeoutError() Exception raised in Job[1]: TimeoutError() Evaluating: 66%|██████▌ | 33/50 [09:00<04:02, 14.24s/it]Exception raised in Job[13]: TimeoutError() Exception raised in Job[39]: TimeoutError() Exception raised in Job[5]: TimeoutError() Exception raised in Job[22]: TimeoutError() Exception raised in Job[31]: TimeoutError() Exception raised in Job[6]: TimeoutError() Exception raised in Job[41]: TimeoutError() Exception raised in Job[14]: TimeoutError() Exception raised in Job[48]: TimeoutError() Exception raised in Job[40]: TimeoutError() Exception raised in Job[23]: TimeoutError() Exception raised in Job[49]: TimeoutError() Exception raised in Job[15]: TimeoutError() Exception raised in Job[32]: TimeoutError() Exception raised in Job[3]: TimeoutError() Exception raised in Job[24]: TimeoutError() Exception raised in Job[7]: TimeoutError() Evaluating: 100%|██████████| 50/50 [12:00<00:00, 14.40s/it] Expected behavior It should return the values for the corresponding Evaluation Metric

Additional context

question | contexts | answer | ground_truth | answer_correctness | answer_similarity | answer_relevancy | context_recall | context_precision -- | -- | -- | -- | -- | -- | -- | -- | -- What should I be concerned about when consider... | [['Source: Casualty-Automotive, Vehicles and A... | When considering an autonomous vehicle exposur... | The rapid development of autonomous vehicle te... | NaN | NaN | NaN | NaN | NaN
dosubot[bot] commented 1 month ago

I found a similar open issue that might help you resolve the TimeoutError when using the evaluate() function with Langchain-wrapped models in the ragas library. The issue is titled Facing error for evaluate() for Langchain instance LLM and Embedding models and is still open [1].

The suggested solution involves setting the timeout parameter in the RunConfig and ensuring it is passed to both the LLM and the embeddings. Here is an example of how to do it:

from ragas.run_config import RunConfig
from ragas.llms.base import LangchainLLMWrapper
from ragas.embeddings.base import LangchainEmbeddingsWrapper
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas.metrics import answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision
from ragas import evaluate
from datasets import Dataset

# Create a RunConfig with a specified timeout
run_config = RunConfig(timeout=60)  # Set timeout to 60 seconds

# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)

# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2", run_config=run_config)

# Create the dataset
data_samples = {
    'question': ['What should I be concerned about when considering an autonomous vehicle exposure?'],
    'answer': ['When considering an autonomous vehicle exposure, you should be aware of the rapid development of autonomous vehicle technology.'],
    'contexts': [['Source: Casualty-Automotive, Vehicles and Autonomous Vehicles. The rapid development of autonomous vehicle technology is a key concern.']],
}
dataset = Dataset.from_dict(data_samples)

# Use the evaluate function
result = evaluate(dataset, metrics=[answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision], llm=langchain_llm, embeddings=langchain_embeddings)

# Extract the details into a dataframe
results_df = result.to_pandas()
print(results_df)

Additionally, you can enable debugging and logging options to diagnose the TimeoutError by setting the log_tenacity parameter in RunConfig to True and enabling the RAGAS_DEBUG environment variable:

import os
os.environ["RAGAS_DEBUG"] = "true"

run_config = RunConfig(timeout=60, log_tenacity=True)

This will provide comprehensive logging and debugging information [1].

To continue talking to Dosu, mention @dosu.

plaban1981 commented 1 month ago

@dosu. Encountered the same error even including the RunConfig with timeout as 60

CODE

import os
os.environ["RAGAS_DEBUG"] = "true"

run_config = RunConfig(timeout=60, log_tenacity=True)

# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)

# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2")

result = evaluate( eval_dataset,
    metrics= [
    answer_correctness,answer_similarity,
    answer_relevancy,
    context_recall,context_precision
    ],
    llm=langchain_llm,embeddings=langchain_embeddings)

ERROR

Evaluating:   0%|          | 0/5 [00:00<?, ?it/s]Exception raised in Job[1]: TimeoutError()
Evaluating:  20%|██        | 1/5 [03:00<12:00, 180.02s/it]Exception raised in Job[0]: TimeoutError()
Exception raised in Job[2]: TimeoutError()
Exception raised in Job[4]: TimeoutError()
Exception raised in Job[3]: TimeoutError()
Evaluating: 100%|██████████| 5/5 [03:00<00:00, 36.01s/it] 
dosubot[bot] commented 1 month ago

To resolve the TimeoutError when using the evaluate() function with ChatOllama and OllamaEmbeddings, ensure that the timeout parameter is correctly passed to both the LLM and the embeddings. It looks like the run_config is not being passed to the OllamaEmbeddings in your code. Here is the corrected version:

import os
from ragas.run_config import RunConfig
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas import evaluate
from ragas.metrics import answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision

# Enable debugging
os.environ["RAGAS_DEBUG"] = "true"

# Create a RunConfig with a specified timeout
run_config = RunConfig(timeout=60, log_tenacity=True)

# Initialize the LLM with the run_config
langchain_llm = ChatOllama(model="llama3.2", run_config=run_config)

# Initialize the embeddings with the run_config
langchain_embeddings = OllamaEmbeddings(model="llama3.2", run_config=run_config)

# Use the evaluate function
result = evaluate(
    eval_dataset,
    metrics=[answer_correctness, answer_similarity, answer_relevancy, context_recall, context_precision],
    llm=langchain_llm,
    embeddings=langchain_embeddings,
)

# Extract the details into a dataframe
results_df = result.to_pandas()
print(results_df)

Ensure that both ChatOllama and OllamaEmbeddings are correctly configured to respect the RunConfig settings, including the timeout parameter [1][2].

It looks like we're facing a persistent issue here, @jjmachan, your expertise would be greatly appreciated to help resolve this.

To continue talking to Dosu, mention @dosu.

plaban1981 commented 1 month ago

@jjmachan
Could you please help me. We are trying to evaluate RAG responses using Ollama in our local system. And encounter the above error. The same error also persists when trying to use Groq as well.

Any help will be much appreciated. Thank You

plaban1981 commented 1 month ago

@dosu. / @jjmachan

Any updates on how to resolve the issue.

dosubot[bot] commented 1 month ago

A maintainer has already been looped in.

To continue talking to Dosu, mention @dosu.

jjmachan commented 1 month ago

hey @plaban1981 - great to see you again 🙂

currently ollama model are not supported sadly, could you try any of the other model providers?

plaban1981 commented 1 month ago

@jjmachan Thank you. I did try with Groq Models as well. But I encounter the same issue. I would be grateful if you can help me with a code sample where RGAS evaluation can be done using an open source model. Thanks

plaban1981 commented 1 month ago

@jjmachan

https://github.com/mosh98/RAG_With_Models/blob/main/evaluation/RAGAS%20DEMO.ipynb. Please check this notebook they have implemented RAGAS using ChatOllama and were able to produce results.

jjmachan commented 1 month ago

@plaban1981 thanks a lot for sharing that! did that work for you too?

plaban1981 commented 1 month ago

@jjmachan , Unfortunately no. Can you please help me with any working examples of RAGAS evaluation using open source model.

Thank you, Plaban

Sohammhatre10 commented 1 month ago

Hello @plaban1981 have you found out any way around this? Or maybe @jjmachan is the Ollama support back on? Working on creating a cookbook, encountered ConnectError for mistral

Evaluating: 100%
 2/2 [02:23<00:00, 60.54s/it]
ERROR:ragas.executor:Exception raised in Job[1]: ConnectError(All connection attempts failed)
ERROR:ragas.executor:Exception raised in Job[0]: ConnectError(All connection attempts failed)
{
   "answer_relevancy": NaN
}
plaban1981 commented 3 weeks ago

Hello @plaban1981 have you found out any way around this? Or maybe @jjmachan is the Ollama support back on? Working on creating a cookbook, encountered ConnectError for mistral

Evaluating: 100%
 2/2 [02:23<00:00, 60.54s/it]
ERROR:ragas.executor:Exception raised in Job[1]: ConnectError(All connection attempts failed)
ERROR:ragas.executor:Exception raised in Job[0]: ConnectError(All connection attempts failed)
{
   "answer_relevancy": NaN
}

@Sohammhatre10 The issue is still not resolved

@jjmachan any help will be highly appreciated