explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.58k stars 646 forks source link

faithfulness_score: nan #1309

Open beatG123 opened 4 days ago

beatG123 commented 4 days ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question faithfulness_score: always be nan

Code Examples from agent import * from datasets import Dataset import json import random

文本加载

from langchain_community.document_loaders import WebBaseLoader loader = WebBaseLoader("https://baike.baidu.com/item/%E7%BA%BD%E7%BA%A6/6230") loader.requests_kwargs = {'verify':False} data = loader.load()

print(data)

创建向量索引

from langchain.indexes import VectorstoreIndexCreator index = VectorstoreIndexCreator().from_loaders([loader])

创建qa链

from langchain.chains import RetrievalQA from langchain_community.chat_models import ChatOpenAI llm = ChatOpenAI() qa_chain = RetrievalQA.from_chain_type( llm, retriever=index.vectorstore.as_retriever(), return_source_documents=True,verbose = VERBOSE )

question = "纽约市的名字是怎么得来的?" result = qa_chain({"query": question})

result["result"]

print("========= chain result ==========")

print(result)

result['ground_truths'] = "纽约市的名字“纽约”来源于荷兰战败后将新阿姆斯特丹割让给英国的事件。" from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall from ragas.langchain.evalchain import RagasEvaluatorChain

make eval chains

eval_chains = { m.name: RagasEvaluatorChain(metric=m) for m in [faithfulness, answer_relevancy, context_relevancy, context_recall] }

evaluate

for name, eval_chain in eval_chains.items(): score_name = f"{name}_score" print(f"{score_name}: {eval_chain(result)[score_name]}")

Additional context

Entering new RetrievalQA chain...

Finished chain. faithfulness_score: nan answer_relevancy_score: 0.8500799716619457 context_relevancy_score: 0.016666666666666666 context_recall_score: 1.0

dosubot[bot] commented 4 days ago

The faithfulness_score is always nan in your evaluation because metrics like faithfulness require the source_documents to be present in the result. Ensure that the source_documents are included in the result when running the evaluation.

Here are some key points to check:

  1. Ensure source_documents are included in the result:

    result = qa_chain({"query": question})
    if not result.get("source_documents"):
        raise ValueError("Source documents are missing in the result.")
  2. Check the segmentation and statement generation: The faithfulness metric uses a sentence segmenter to break down the response into simpler statements. Ensure that this segmentation is working correctly.

    from some_sentence_segmenter import segmenter  # Replace with actual import
    sentences = segmenter.segment(result["result"])
    sentences = [sentence for sentence in sentences if sentence.strip().endswith(".")]
  3. Verify the NLI (Natural Language Inference) step: The metric uses NLI to judge the faithfulness of the statements based on the context. Ensure that the NLI model is correctly set up and generating valid results.

    from some_nli_model import nli_model  # Replace with actual import
    contexts_str = " ".join([doc.page_content for doc in result["source_documents"]])
    statements_str = " ".join(sentences)
    p_value = f"Context: {contexts_str}\nStatements: {statements_str}"
    nli_result = nli_model.generate(p_value)
  4. Evaluate faithfulness:

    eval_result = eval_chains["faithfulness"](result)
    print(f"faithfulness_score: {eval_result['faithfulness_score']}")

By ensuring these steps are correctly implemented and the necessary data is present, you should be able to avoid nan values in your faithfulness_score [1][2].

To continue talking to Dosu, mention @dosu.

beatG123 commented 4 days ago

This should be the problem of chinese