explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.3k stars 746 forks source link

WARNING:ragas.metrics._faithfulness:No statements were generated from the answer. #1651

Open kyuz0 opened 1 week ago

kyuz0 commented 1 week ago

The official example to calculate faithfulness on a single sample straight from the website doc doesn't work:

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

evaluator_llm = LangchainLLMWrapper(ChatOpenAI())
evaluator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

# create a Single Turn Sample
from ragas import SingleTurnSample

sample = SingleTurnSample(
        user_input="When was the first super bowl?",
        response="The first superbowl was held on Jan 15, 1967",
        retrieved_contexts=[
            "The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
        ]
    )

# Init the metric
from ragas.metrics import Faithfulness
faithfulness_metric = Faithfulness(llm=evaluator_llm)
score = await faithfulness_metric.single_turn_ascore(sample=sample)
print(score)

This results into this warning and nan score:

WARNING:ragas.metrics._faithfulness:No statements were generated from the answer.

What am I missing or doing wrong?

AshishSardana commented 1 week ago

i came across the same bug recently. RAGAS seperates the sentences in the response basis the logic here i.e. if sentence.strip().endswith((".", "。", "!", "!"))

since your response string doesn't contain any of these, an empty response (" ") is being sent to check faithfulness. try adding one of these sentence splitters (".") to the response to verify.

grobruegge commented 3 days ago

This sentence splitting logic is actually a bit inconsistent across different metrics (I am using version 0.2.5). For example:

For me, it is not clear why you need to filter the sentences in the first place, because the sentence_splitter is already doing a good job.

In my use-case, I do not control the generation of the datasets and some ground answers may not contain any identifiers (e.g., ".") at the end of the sentence. An example that comes to mind is:

Thus, I created custom subclass of the metrics and changed the _create_statements() method such that the sentences are no longer filtered (i.e., I removed the if sentence.strip().endswith(".")).

Question: Is there any unexpected behavior which might occur due to this?