explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.35k stars 749 forks source link

Get an error 'Invalid n value (currently only n = 1 is supported)' when calculating the score of metric named 'answer_relevancy' #1432

Open francescofan opened 1 month ago

francescofan commented 1 month ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug Error message is "Invalid n value (currently only n = 1 is supported)".

Ragas version: 0.1.20 Python version: 3.11.9 Python Environment: Anaconda 3 LLM: Deepseek v2.0 System: Windows 10 Enterprise Develop Env:Visual Studio Code 1.85.0

Code to Reproduce

deepseek_llm = ChatOpenAI(
    model = model,
    openai_api_key = api_key,
    openai_api_base = base_url
)
embeddings = MyCustomEmbeddings("http://customized.embedding")
data= {
'question': ['When was the first super bowl?'],
'answer': ['The first superbowl was held on Jan 15, 1967'],
'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,']],
'ground_truth': ['The first superbowl was held on January 15, 1967']
}
dataset = Dataset.from_dict(data)
score = evaluate(dataset,metrics=[faithfulness,context_recall,context_precision,answer_correctness,answer_similarity,answer_relevancy],llm=deepseek_llm,embeddings=embeddings)

Error trace xception raised in Job[5]: BadRequestError(Error code: 400 - {'error': {'message': 'Invalid n value (currently only n = 1 is supported)', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}})

Expected behavior get the score of answer_relevancy with no error

Additional context 'answer_relevancy': nan when i use llm=deepseek_llm,embeddings=embeddings but 'answer_relevancy': 0.980756386361107 when i use OpenAI LLM and Embedding.

Is there any difference between OpenAI and Deepseek API or anything else. Multiple Completions abilities? How can confirm it?

dosubot[bot] commented 1 month ago

The error message "Invalid n value (currently only n = 1 is supported)" suggests that the 'answer_relevancy' metric in the Ragas framework is currently configured to only support an 'n' value of 1. This 'n' value, referred to as 'strictness' in the framework, indicates the number of questions generated per answer. The default 'strictness' value is set to 3, which might be causing the issue when using the Deepseek LLM [1][2].

To resolve this, you can try setting the 'strictness' value to 1 when configuring the 'answer_relevancy' metric. This adjustment should align with the current support for 'n = 1' and help avoid the error.

Regarding the difference between OpenAI and Deepseek LLMs, the Ragas framework supports all LLMs and embeddings available in Langchain, and any custom LLM or embeddings should be a subclass of the base classes BaseRagasLLM and BaseRagasEmbeddings [3]. However, specific differences in API request handling between OpenAI and Deepseek LLMs within the Ragas framework are not detailed in the available information. You might want to check if there are any specific configurations or limitations in the Deepseek API that could affect the 'answer_relevancy' metric.

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu To resolve this, you can try setting the 'strictness' value to 1 when configuring the 'answer_relevancy' metric. This adjustment should align with the current support for 'n = 1' and help avoid the error. Please tell me the details how.

dosubot[bot] commented 1 month ago

To set the 'strictness' value to 1 for the 'answer_relevancy' metric, you need to modify the strictness attribute in the ResponseRelevancy class. Here's how you can do it:

@dataclass
class ResponseRelevancy(MetricWithLLM, MetricWithEmbeddings, SingleTurnMetric):
    # Other attributes and methods...

    strictness: int = 1  # Set strictness to 1

This change will ensure that only one question is generated per answer, which should resolve the "Invalid n value (currently only n = 1 is supported)" error you are encountering [1].

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu I have modified py file ''Lib\site-packages\ragas\metrics_answer_relevance.py" like following:

class AnswerRelevancy(MetricWithLLM, MetricWithEmbeddings, SingleTurnMetric):
...
    question_generation: Prompt = field(default_factory=lambda: QUESTION_GEN)
    # strictness: int = 3
    strictness: int = 1

Another Error occured after I changed the strictness attribute in the ResponseRelevancy. Error: xception raised in Job[5]: RuntimeError(This event loop is already running) Result: 'answer_relevancy': nan

dosubot[bot] commented 1 month ago

The error "RuntimeError(This event loop is already running)" typically occurs in environments where an event loop is already active, such as Jupyter notebooks. To handle this, the Ragas framework uses the nest_asyncio library to allow nested event loops. You can resolve this error by installing nest_asyncio and applying it in your code. Here's how you can do it:

  1. Install nest_asyncio if you haven't already:

    pip install nest_asyncio
  2. Apply nest_asyncio in your script before running the evaluate function:

    import nest_asyncio
    nest_asyncio.apply()

This should help prevent the "RuntimeError(This event loop is already running)" error. If you continue to see 'answer_relevancy': nan, it might be due to other issues with the LLM or embeddings configuration, so ensure they are correctly set up and compatible with the Ragas framework [1].

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

Thank U @dosu. It works. 2 Steps as following: 1st Step: modify the py file as following:

class AnswerRelevancy(MetricWithLLM, MetricWithEmbeddings, SingleTurnMetric):
...
    question_generation: Prompt = field(default_factory=lambda: QUESTION_GEN)
    # strictness: int = 3
    strictness: int = 1

2nd Step: add code as following before evaluate:

import nest_asyncio
nest_asyncio.apply()
evaluate(...)