Nan Values for the answer_relevancy: Exception raised in Job[3]: NotFoundError(Error code: 404).

Raushan999 commented 2 months ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug. [ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

I'm trying to evaluate my rag output with the faithfulness and answer_relevancy, but I'm getting NaN for answer_relevancy, although it works for faithfulness.
I also tried downgrading the version of openai or using langchain-openai to import the embeddings and chat model, but the same error persists. -- Please help me with this nan output in the answer relevancy.

Libraries version used:- openai==1.42.0, ragas==0.1.14, langchain==0.2.3, python==3.11.9 And this persists with the latest version of ragas as well. And on going with the lower versions of ragas, it starts throwing various import errors. Code Examples

from langchain.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings

azure_model = AzureChatOpenAI(
    openai_api_version=openai_api_version,
    azure_endpoint=openai_api_base,
    azure_deployment=deployment_name ,
    model=model_name ,
    validate_base_url=False,
)
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version=openai_api_version,
    azure_endpoint = search_endpoint,
    azure_deployment = azure_embedding_deployment,
    model = azure_embedding_model,
)
result = evaluate(
    dataset, metrics = [faithfulness,answer_relevancy], llm = azure_model, embeddings=azure_embeddings)
result.to_pandas()

Additional context

dosubot[bot] commented 2 months ago

Hey @Raushan999! I'm here to help you with your bug, answer your questions, and guide you on how to contribute.

I found a similar closed issue that might be relevant to your problem. The issue was related to NaN values for context recall and faithfulness metrics due to the default max_output_tokens parameter being too small. Increasing this parameter resolved the issue ^[1].

For your specific problem with answer_relevancy, ensure that you have correctly initialized the answer_relevancy metric with the appropriate LLM and embeddings. Here is a detailed code snippet to help you:

from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings

# util function to init Ragas Metrics
def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings

# Initialize Azure models
azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint="https://<your-endpoint>.openai.azure.com/",
    azure_deployment="your-deployment-name",
    model="your-model-name",
    validate_base_url=False,
)

azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint="https://<your-endpoint>.openai.azure.com/",
    azure_deployment="your-embedding-deployment",
    model="text-embedding-ada-002",
)

# Initialize metrics
metrics = [faithfulness, answer_relevancy]
init_ragas_metrics(metrics, llm=azure_model, embedding=azure_embeddings)

# Evaluate
result = evaluate(
    dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings
)
result.to_pandas()

Ensure that your Azure configurations are correctly set up and that you have set your Azure OpenAI key in your environment:

import os
os.environ["AZURE_OPENAI_API_KEY"] = "your-azure-openai-key"

Additionally, make sure that the AnswerRelevancy class is correctly implemented and that the calculate_similarity and _calculate_score methods are functioning as expected. Proper initialization of the LLM and embeddings is crucial to avoid NaN values ^[2].

If you continue to face issues, you might want to check if there are any specific limitations or additional configurations required for using Azure-based models and embeddings with the Ragas framework ^[3].

To continue talking to Dosu, mention @dosu.

Raushan999 commented 2 months ago

Hey @Raushan999! I'm here to help you with your bug, answer your questions, and guide you on how to contribute.

I found a similar closed issue that might be relevant to your problem. The issue was related to NaN values for context recall and faithfulness metrics due to the default max_output_tokens parameter being too small. Increasing this parameter resolved the issue [1].

For your specific problem with answer_relevancy, ensure that you have correctly initialized the answer_relevancy metric with the appropriate LLM and embeddings. Here is a detailed code snippet to help you:
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings

# util function to init Ragas Metrics
def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings

# Initialize Azure models
azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint="https://<your-endpoint>.openai.azure.com/",
    azure_deployment="your-deployment-name",
    model="your-model-name",
    validate_base_url=False,
)

azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint="https://<your-endpoint>.openai.azure.com/",
    azure_deployment="your-embedding-deployment",
    model="text-embedding-ada-002",
)

# Initialize metrics
metrics = [faithfulness, answer_relevancy]
init_ragas_metrics(metrics, llm=azure_model, embedding=azure_embeddings)

# Evaluate
result = evaluate(
    dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings
)
result.to_pandas()
Ensure that your Azure configurations are correctly set up and that you have set your Azure OpenAI key in your environment:
import os
os.environ["AZURE_OPENAI_API_KEY"] = "your-azure-openai-key"
Additionally, make sure that the AnswerRelevancy class is correctly implemented and that the calculate_similarity and _calculate_score methods are functioning as expected. Proper initialization of the LLM and embeddings is crucial to avoid NaN values [2].

If you continue to face issues, you might want to check if there are any specific limitations or additional configurations required for using Azure-based models and embeddings with the Ragas framework [3].

To continue talking to Dosu, mention @dosu.

checked this, still not working.

Now I'm getting this error:- AttributeError: 'AzureChatOpenAI' object has no attribute 'set_run_config'

Raushan999 commented 2 months ago

I resolved this issue myself, Thanks!!

MartinPanaget commented 2 months ago

Hi @Raushan999 , how did you resolve the problem. I have the same issue?

Raushan999 commented 2 months ago

Hi @Raushan999 , how did you resolve the problem. I have the same issue?

Hi @MartinPanaget, actually I'd various other dependencies installed which were not allowing the embeddings to work. So I just uninstalled requirement.txt file and reinstalled. The following libraries I used:

openpyxl
datasets==2.19.1
ragas==0.1.5
openai==1.16.0

explodinggradients / ragas

Nan Values for the answer_relevancy: Exception raised in Job[3]: NotFoundError(Error code: 404). #1221