explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.05k stars 714 forks source link

How is it ensured that ground truth answers are complete and not just partially correct? #1218

Open neerajchhimwal opened 2 months ago

neerajchhimwal commented 2 months ago

[👍 ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question During testset generation using langchain docs, is one question (let's say simple) answered by looking at the entire document? How is it ensured that the ground truth answer will be complete if the document is long? Could it be that the answer comes from one of the "chunks" from this document and is hence only partially correct?

dosubot[bot] commented 2 months ago

To ensure that ground truth answers are complete and not just partially correct when generating a test set using LangChain docs, especially if the document is long and answers might come from different chunks, the TestsetGenerator class uses a DocumentStore that handles chunking and storing documents. The InMemoryDocumentStore is initialized with a TokenTextSplitter that chunks the documents into manageable sizes (default chunk size is 1024 tokens). This ensures that the entire document is processed and stored in chunks, allowing for comprehensive retrieval of information.

Additionally, Ragas employs an evolutionary generation paradigm to create diverse and complex questions from the provided documents. This approach ensures that questions require information from multiple related sections or chunks, enhancing the completeness of the ground truth answers. The methods used include:

The AbstractQA class in Ragas further ensures completeness by generating specific questions from particular sections of documents. It uses prompts to order sections by relevance and retrieve chunks of text from nodes, ensuring that the generated questions and answers are comprehensive and cover multiple sections if necessary. The critic mechanism evaluates the generated questions to ensure they meet quality standards before generating answers.

Moreover, the SimpleTestGenerator class in Ragas includes a detailed document extraction and splitting process. It uses various extractors like summary_extractor, link_extractor, email_extractor, keyphrase_extractor, title_extractor, and headline_extractor to extract relevant information from documents. The HeadlineSplitter is then used to split documents based on headlines, ensuring that chunks are logically segmented.

The nodes created from these chunks are embedded and further processed to extract keyphrases and other relevant information. Relationships between nodes are built using similarity functions like Jaccard and Cosine, ensuring that related chunks are connected. This comprehensive extraction, splitting, and relationship-building process ensures that the generated questions and answers are complete and derived from multiple relevant sections of the document.

Here is the relevant code snippet:

from langchain.text_splitter import TokenTextSplitter

splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
    splitter=splitter,
    embeddings=embeddings_model,
    extractor=keyphrase_extractor,
    run_config=run_config,
)

This chunking mechanism, combined with the evolutionary generation paradigm, specific question generation process, and detailed document extraction and splitting process, ensures that even if the document is long, the answers can be retrieved from different chunks, ensuring completeness. The AbstractQA class and SimpleTestGenerator class use a combination of prompts, critic mechanisms, and relationship-building techniques to generate and validate questions and answers, ensuring that the answers are comprehensive and derived from multiple relevant sections of the document [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.