explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.44k stars 753 forks source link

Invalid n value (currently only n = 1 is supported) #1072

Open jidechao opened 5 months ago

jidechao commented 5 months ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

openai.BadRequestError: Error code: 400 - {'detail': 'Invalid n value (currently only n = 1 is supported)'}

Ragas version: 0.1.10

Code to Reproduce

from ragas import evaluate
from langchain_openai import ChatOpenAI, OpenAI
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from datasets import Dataset
from langchain.vectorstores.milvus import Milvus

#embedding(chroma)
model_name = "BAAI/bge-m3"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

import nest_asyncio

nest_asyncio.apply()

llm = ChatOpenAI(model="deepseek-chat",api_key="sk-*******",base_url="https://api.deepseek.com/v1" ,temperature=0, n=1)

from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_relevancy,
    context_recall,
    context_precision,
)

data = {
    'question': ['What is the capital of France?'],
    'contexts': [['Paris is the capital of France.']],
    'answer': ['Paris'],
    'ground_truth': ['Paris']
}
dataset = Dataset.from_dict(data)

result = evaluate(
    dataset = dataset, 
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
    llm=llm,
    embeddings=embeddings,
)

result

**Error trace**
Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 95, in run
    results = self.loop.run_until_complete(self._aresults())
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 83, in _aresults
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 78, in _aresults
    r = await future
  File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 37, in sema_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 111, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ragas/metrics/base.py", line 125, in ascore
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ragas/metrics/base.py", line 121, in ascore
    score = await self._ascore(row=row, callbacks=group_cm, is_async=is_async)
  File "/usr/local/lib/python3.10/dist-packages/ragas/metrics/_answer_relevance.py", line 152, in _ascore
    result = await self.llm.generate(
  File "/usr/local/lib/python3.10/dist-packages/ragas/llms/base.py", line 93, in generate
    return await agenerate_text_with_retry(
  File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
  File "/usr/local/lib/python3.10/dist-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ragas/llms/base.py", line 170, in agenerate_text
    return await self.langchain_llm.agenerate_prompt(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 691, in agenerate_prompt
    return await self.agenerate(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 651, in agenerate
    raise exceptions[0]
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 836, in _agenerate_with_cache
    result = await self._agenerate(
  File "/usr/local/lib/python3.10/dist-packages/langchain_openai/chat_models/base.py", line 674, in _agenerate
    response = await self.async_client.create(**payload)
  File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 1289, in create
    return await self._post(
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1816, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1514, in request
    return await self._request(
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1610, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'detail': 'Invalid n value (currently only n = 1 is supported)'}

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

dosubot[bot] commented 5 months ago

Hello, @jidechao! I'm here to help you with any bugs or questions you have. Let's work together to solve this issue.

To resolve the openai.BadRequestError: Error code: 400 - {'detail': 'Invalid n value (currently only n = 1 is supported)'} error, ensure that the n parameter is set to 1 in your ChatOpenAI model configuration. The ragas library currently supports only n = 1 for text generation.

Here's the corrected part of your code:

llm = ChatOpenAI(
    model="deepseek-chat",
    api_key="sk-36906d31048e43c1a7ba5d7b9b45ea0f",
    base_url="https://api.deepseek.com/v1",
    temperature=0,
    n=1  # Ensure n is set to 1
)

Make sure that the n parameter is not set to any value other than 1. This should resolve the error related to the "Invalid n value" [1].

To continue talking to Dosu, mention @dosu.

jidechao commented 5 months ago

Hello, @jidechao! I'm here to help you with any bugs or questions you have. Let's work together to solve this issue.

To resolve the openai.BadRequestError: Error code: 400 - {'detail': 'Invalid n value (currently only n = 1 is supported)'} error, ensure that the n parameter is set to 1 in your ChatOpenAI model configuration. The ragas library currently supports only n = 1 for text generation.

Here's the corrected part of your code:

llm = ChatOpenAI(
    model="deepseek-chat",
    api_key="sk-36906d31048e43c1a7ba5d7b9b45ea0f",
    base_url="https://api.deepseek.com/v1",
    temperature=0,
    n=1  # Ensure n is set to 1
)

Make sure that the n parameter is not set to any value other than 1. This should resolve the error related to the "Invalid n value" [1].

To continue talking to Dosu, mention @dosu.

I tested it and found that three metrics can be executed. With four metrics, it reports an error:Invalid n value (currently only n = 1 is supported)

dosubot[bot] commented 5 months ago

The "Invalid n value" error when evaluating four metrics simultaneously is likely due to the internal handling of multiple metrics in the ragas library. Here are a few steps to ensure your setup is correct:

  1. Ensure Dataset Format: Verify that your dataset has the correct format and column names (question, contexts, answer, ground_truth).

  2. Check Metric Initialization: Ensure that all metrics are properly initialized with the required language models (LLMs) and embeddings.

  3. Set n Parameter Correctly: The n parameter in the ChatOpenAI class should be set to 1, as the library currently supports only n = 1 for text generation.

Here is an example of how to structure your dataset and call the evaluate function:

from ragas import evaluate
from datasets import Dataset
from langchain_openai import ChatOpenAI
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

# Example dataset
data = {
    'question': ["What is the capital of France?"],
    'contexts': [["Paris is the capital of France."]],
    'answer': ["Paris"],
    'ground_truth': [["Paris"]]
}

dataset = Dataset.from_dict(data)

# Initialize LLM and embeddings
llm = ChatOpenAI(model="deepseek-chat", api_key="sk-*******", base_url="https://api.deepseek.com/v1", temperature=0, n=1)
embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-m3", model_kwargs={'device': 'cpu'}, encode_kwargs={'normalize_embeddings': True})

# Import metrics
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall

# Call the evaluate function
result = evaluate(
    dataset=dataset,
    metrics=[context_precision, context_recall, faithfulness, answer_relevancy],
    llm=llm,
    embeddings=embeddings,
)

print(result)

Ensure that the n parameter is set to 1 in the ChatOpenAI configuration. This should help resolve the "Invalid n value" error by ensuring the dataset is correctly formatted and the n parameter is properly set [1][2].

To continue talking to Dosu, mention @dosu.

jidechao commented 5 months ago
metrics

I tested it again, and it seems that answer_relevancy has an issue, while other metrics can be executed normally.

error: image

correct: image

123gbk commented 4 months ago

I'm having the same problem.

Mervyn-1 commented 4 months ago

same problem

jjmachan commented 4 months ago

@jidechao @123gbk @Mervyn-1 which mode service are you using?

The issue is that use using ChatOpenAI so ragas expects n parameter to be supported. but I realize we need a way to override it now

Xiyuche commented 3 months ago

@jidechao @123gbk @Mervyn-1 which mode service are you using?

The issue is that use using ChatOpenAI so ragas expects n parameter to be supported. but I realize we need a way to override it now

I guess the problem might be with deepseek-chat, which does not support n > 1. I'm not entirely sure about that, but I just found that both @jidechao and I are using deepseek compatible mode.

Maybe we can fix this problem by calling the deepseek API n times with 1 reply each time, rather than calling it once with n replies.

PhillipRt commented 3 months ago

This issue is specifically related to the Deepseek API, which doesn't support the OpenAI API feature of generating multiple responses with a single call (the n parameter).

I've implemented a fix by modifying the _ascore method in src/ragas/metrics/_answer_relevance.py. Here's the updated code:

async def _ascore(self, row: t.Dict, callbacks: Callbacks) -> float:
    assert self.llm is not None, "LLM is not set"

    prompt = self._create_question_gen_prompt(row)

    answers = []
    for _ in range(self.strictness):
        result = await self.llm.generate(
            prompt,
            n=1,
            callbacks=callbacks,
        )
        generated_text = result.generations[0][0].text
        answer = await _output_parser.aparse(generated_text, prompt, self.llm)
        if answer is not None:
            answers.append(answer)

    if not answers:
        return np.nan

    return self._calculate_score(answers, row)

This solution loops through the generation process self.strictness number of times, each time generating a single response. This workaround allows the code to work with APIs like Deepseek that don't support the n parameter for multiple generations.

However, it's worth noting that this approach might have performance implications when used with other models that do support generating multiple responses in a single call.

A more optimal solution would be to implement a mechanism to switch between looping and using the n parameter based on the capabilities of the underlying LLM. But I don't know whats the best way to add this information to the LLM class.

jjmachan commented 3 months ago

@Xiyuche actually we do that for LLMs that don't support n parameter and the solution is similar to what @PhillipRt has already suggested (thanks a lot btw 🙂 ❤️ )

the con is that it's hard to keep a list. For example ChatOpenAI does support it but in this case since you're using the same class this happens. https://github.com/explodinggradients/ragas/blob/b988f538b968b378604240d2b279f8bf4d8ecca9/src/ragas/llms/base.py#L29-L35 https://github.com/explodinggradients/ragas/blob/b988f538b968b378604240d2b279f8bf4d8ecca9/src/ragas/llms/base.py#L39-L44

I think the solution would be to have a argument inside the LLM for this.

Xiyuche commented 3 months ago

@Xiyuche actually we do that for LLMs that don't support n parameter and the solution is similar to what @PhillipRt has already suggested (thanks a lot btw 🙂 ❤️ )

the con is that it's hard to keep a list. For example ChatOpenAI does support it but in this case since you're using the same class this happens.

https://github.com/explodinggradients/ragas/blob/b988f538b968b378604240d2b279f8bf4d8ecca9/src/ragas/llms/base.py#L29-L35

https://github.com/explodinggradients/ragas/blob/b988f538b968b378604240d2b279f8bf4d8ecca9/src/ragas/llms/base.py#L39-L44

I think the solution would be to have a argument inside the LLM for this.

Thank you, a lot of API providers like deepseek recommand using compatible mode directly , which is however, not that much perfectly compatible with openai 😂 just like n