explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.09k stars 719 forks source link

Local Model Runner in Executor raises exceptions #1226

Open g-hano opened 2 months ago

g-hano commented 2 months ago

Describe the bug I want to use local llms to evaluate my rag app, I have tried Ollama and HuggingFace models but neither of them is working.

Ragas version: 0.1.11 Python version: 3.11.3

Code to Reproduce

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
tokenizer.pad_token_id=tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

df = pd.read_csv("output.csv", sep=";")
data_samples = {
    'question': df['question'].tolist(),
    'answer': df['answer'].tolist(),
    'contexts': df['contexts'].apply(lambda x: [x] if isinstance(x, str) else x).tolist(),
    'ground_truth': df['ground_truth'].tolist()
}

from datasets import Dataset 
dataset = Dataset.from_dict(data_samples)

from ragas import evaluate
from ragas.metrics import (faithfulness, 
                           answer_correctness,    
                           answer_relevancy,
                           context_recall,
                           context_precision)

from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
from ragas.llms import LangchainLLMWrapper

pipe = pipeline(model=model, tokenizer=tokenizer, task="text-generation")
hf_pipe = HuggingFacePipeline(pipeline=pipe)
ragas_llm = LangchainLLMWrapper(langchain_llm=hf_pipe)
ragas_embed = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs={"trust_remote_code": True})

metrics=[faithfulness, 
        answer_correctness, 
        answer_relevancy, 
        context_recall,
        context_precision]
score = evaluate(dataset=dataset,
                 metrics=metrics,
                 llm=ragas_llm,
                 embeddings=ragas_embed,
                 raise_exceptions=False
                 )

Error trace

Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception
Runner in Executor raised an exception

Expected behavior Succesfuly evaluating

dosubot[bot] commented 2 months ago

The issue you're encountering with the Local Model Runner in Executor raising exceptions when using Ollama and HuggingFace models to evaluate your RAG app with Ragas version 0.1.11 and Python version 3.11.3 is likely due to the results list being empty, which triggers an ExceptionInRunner.

Here's a similar issue that was discussed:

In your case, you are already using raise_exceptions=False, which should prevent the exceptions from stopping the execution. However, the underlying issue might still be that the models are not producing the expected results, leading to an empty results list.

Ensure that the models are correctly configured and producing results. Additionally, verify that the environment is correctly set up, especially if running in a Jupyter-like environment, as the nest_asyncio package may be required to handle nested event loops.

Here is a snippet of the relevant code that raises the exception when the results list is empty:

try:
    results = exec.results()
    if results == []:
        raise ExceptionInRunner()
except Exception as e:
    raise e

Make sure that your models are correctly configured and producing results. If the issue persists, you might want to add additional logging or debugging to understand why the results list is empty.

For more details, you can refer to the source code of the Executor class in ragas [1].

To continue talking to Dosu, mention @dosu.

g-hano commented 2 months ago

@dosu I still same issue, now I am trying with Ollama models.

data_samples = {
    'question': df['question'].tolist(),
    'answer': df['answer'].tolist(),
    'contexts': df['contexts'].apply(lambda x: [x] if isinstance(x, str) else x).tolist(),
    'ground_truth': df['ground_truth'].tolist()
}
dataset = Dataset.from_dict(data_samples)
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
langchain_llm = ChatOllama(model="llama3.1:8b")
langchain_embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")
metrics=[faithfulness, 
        answer_correctness, 
        answer_relevancy, 
        context_recall,
        context_precision]

score = evaluate(dataset=dataset,
        metrics=metrics,
        llm=langchain_llm,
        embeddings=langchain_embeddings,
        raise_exceptions=False
        )
Runner in Executor raised an exception
dosubot[bot] commented 2 months ago

The issue with the Local Model Runner in the Executor raising exceptions when using Ollama models is likely due to the raise_exceptions attribute being set to True. This causes any exception that occurs within the wrap_callable_with_index method to be raised instead of being logged and handled gracefully.

To prevent exceptions from being raised and instead log them, ensure that raise_exceptions is set to False when initializing the Executor. Here is the relevant code snippet from the Executor class:

@dataclass
class Executor:
    # ... other fields ...
    raise_exceptions: bool = False
    # ... other fields ...

    def wrap_callable_with_index(self, callable: t.Callable, counter):
        async def wrapped_callable_async(*args, **kwargs):
            result = np.nan
            try:
                result = await callable(*args, **kwargs)
            except MaxRetriesExceeded as e:
                # this only for testset generation v2
                logger.warning(f"max retries exceeded for {e.evolution}")
            except Exception as e:
                if self.raise_exceptions:
                    raise e
                else:
                    exec_name = type(e).__name__
                    exec_message = str(e)
                    logger.error(
                        "Exception raised in Job[%s]: %s(%s)",
                        counter,
                        exec_name,
                        exec_message,
                        exc_info=False,
                    )

            return counter, result

        return wrapped_callable_async

Ensure that raise_exceptions is set to False when initializing the Executor:

score = evaluate(
    dataset=dataset,
    metrics=metrics,
    llm=langchain_llm,
    embeddings=langchain_embeddings,
    raise_exceptions=False
)

If the issue persists, it might be due to the models not producing the expected results, leading to an empty results list. Verify that the models are correctly configured and producing results. Additionally, ensure that the environment is correctly set up, especially if running in a Jupyter-like environment, as the nest_asyncio package may be required to handle nested event loops [1].

If you continue to face issues, consider adding additional logging or debugging to understand why the results list is empty.

To continue talking to Dosu, mention @dosu.

g-hano commented 2 months ago

@dosu I now get

Exception raised in Job[263]: TimeoutError()
Exception raised in Job[84]: TimeoutError()
Exception raised in Job[457]: TimeoutError()
Exception raised in Job[76]: TimeoutError()
dosubot[bot] commented 2 months ago

To resolve the TimeoutError exceptions raised in the Job when using Ollama models with the Local Model Runner in Executor, you can modify the wrap_callable_with_index method to specifically handle TimeoutError exceptions. Here is an example of how you can do this:

def wrap_callable_with_index(self, callable: t.Callable, counter):
    async def wrapped_callable_async(*args, **kwargs):
        result = np.nan
        try:
            result = await callable(*args, **kwargs)
        except MaxRetriesExceeded as e:
            # this only for testset generation v2
            logger.warning(f"max retries exceeded for {e.evolution}")
        except TimeoutError as e:
            logger.error(f"TimeoutError in Job[{counter}]: {str(e)}")
            if self.raise_exceptions:
                raise e
        except Exception as e:
            if self.raise_exceptions:
                raise e
            else:
                exec_name = type(e).__name__
                exec_message = str(e)
                logger.error(
                    "Exception raised in Job[%s]: %s(%s)",
                    counter,
                    exec_name,
                    exec_message,
                    exc_info=False,
                )

        return counter, result

    return wrapped_callable_async

This modification ensures that TimeoutError exceptions are logged and handled appropriately, and optionally re-raised if self.raise_exceptions is set to True [1].

Additionally, you can adjust the timeout duration for jobs using the timeout parameter in the RunConfig class. By default, the timeout is set to 180 seconds. You can increase this duration if needed:

from ragas import RunConfig

config = RunConfig(timeout=300)  # Set timeout to 300 seconds

If you want more detailed logging about the TimeoutError exceptions, you can enable the log_tenacity attribute in the RunConfig class:

run_config = RunConfig(log_tenacity=True)

This will log retry attempts, including TimeoutError exceptions, using the tenacity library's logging capabilities [2].

To continue talking to Dosu, mention @dosu.