Open beatG123 opened 4 days ago
The faithfulness_score
is always nan
in your evaluation because metrics like faithfulness
require the source_documents
to be present in the result. Ensure that the source_documents
are included in the result when running the evaluation.
Here are some key points to check:
Ensure source_documents
are included in the result:
result = qa_chain({"query": question})
if not result.get("source_documents"):
raise ValueError("Source documents are missing in the result.")
Check the segmentation and statement generation:
The faithfulness
metric uses a sentence segmenter to break down the response into simpler statements. Ensure that this segmentation is working correctly.
from some_sentence_segmenter import segmenter # Replace with actual import
sentences = segmenter.segment(result["result"])
sentences = [sentence for sentence in sentences if sentence.strip().endswith(".")]
Verify the NLI (Natural Language Inference) step: The metric uses NLI to judge the faithfulness of the statements based on the context. Ensure that the NLI model is correctly set up and generating valid results.
from some_nli_model import nli_model # Replace with actual import
contexts_str = " ".join([doc.page_content for doc in result["source_documents"]])
statements_str = " ".join(sentences)
p_value = f"Context: {contexts_str}\nStatements: {statements_str}"
nli_result = nli_model.generate(p_value)
Evaluate faithfulness:
eval_result = eval_chains["faithfulness"](result)
print(f"faithfulness_score: {eval_result['faithfulness_score']}")
By ensuring these steps are correctly implemented and the necessary data is present, you should be able to avoid nan
values in your faithfulness_score
[1][2].
To continue talking to Dosu, mention @dosu.
This should be the problem of chinese
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question faithfulness_score: always be nan
Code Examples from agent import * from datasets import Dataset import json import random
文本加载
from langchain_community.document_loaders import WebBaseLoader loader = WebBaseLoader("https://baike.baidu.com/item/%E7%BA%BD%E7%BA%A6/6230") loader.requests_kwargs = {'verify':False} data = loader.load()
print(data)
创建向量索引
from langchain.indexes import VectorstoreIndexCreator index = VectorstoreIndexCreator().from_loaders([loader])
创建qa链
from langchain.chains import RetrievalQA from langchain_community.chat_models import ChatOpenAI llm = ChatOpenAI() qa_chain = RetrievalQA.from_chain_type( llm, retriever=index.vectorstore.as_retriever(), return_source_documents=True,verbose = VERBOSE )
question = "纽约市的名字是怎么得来的?" result = qa_chain({"query": question})
result["result"]
print("========= chain result ==========")
print(result)
result['ground_truths'] = "纽约市的名字“纽约”来源于荷兰战败后将新阿姆斯特丹割让给英国的事件。" from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall from ragas.langchain.evalchain import RagasEvaluatorChain
make eval chains
eval_chains = { m.name: RagasEvaluatorChain(metric=m) for m in [faithfulness, answer_relevancy, context_relevancy, context_recall] }
evaluate
for name, eval_chain in eval_chains.items(): score_name = f"{name}_score" print(f"{score_name}: {eval_chain(result)[score_name]}")
Additional context