How is it ensured that ground truth answers are complete and not just partially correct?

To ensure that ground truth answers are complete and not just partially correct when generating a test set using LangChain docs, especially if the document is long and answers might come from different chunks, the TestsetGenerator class uses a DocumentStore that handles chunking and storing documents. The InMemoryDocumentStore is initialized with a TokenTextSplitter that chunks the documents into manageable sizes (default chunk size is 1024 tokens). This ensures that the entire document is processed and stored in chunks, allowing for comprehensive retrieval of information.

Additionally, Ragas employs an evolutionary generation paradigm to create diverse and complex questions from the provided documents. This approach ensures that questions require information from multiple related sections or chunks, enhancing the completeness of the ground truth answers. The methods used include:

Reasoning: Enhancing the need for reasoning to answer questions effectively.
Conditioning: Introducing conditional elements to add complexity.
Multi-Context: Necessitating information from multiple related sections or chunks.

The AbstractQA class in Ragas further ensures completeness by generating specific questions from particular sections of documents. It uses prompts to order sections by relevance and retrieve chunks of text from nodes, ensuring that the generated questions and answers are comprehensive and cover multiple sections if necessary. The critic mechanism evaluates the generated questions to ensure they meet quality standards before generating answers.

Moreover, the SimpleTestGenerator class in Ragas includes a detailed document extraction and splitting process. It uses various extractors like summary_extractor, link_extractor, email_extractor, keyphrase_extractor, title_extractor, and headline_extractor to extract relevant information from documents. The HeadlineSplitter is then used to split documents based on headlines, ensuring that chunks are logically segmented.

The nodes created from these chunks are embedded and further processed to extract keyphrases and other relevant information. Relationships between nodes are built using similarity functions like Jaccard and Cosine, ensuring that related chunks are connected. This comprehensive extraction, splitting, and relationship-building process ensures that the generated questions and answers are complete and derived from multiple relevant sections of the document.

Here is the relevant code snippet:

from langchain.text_splitter import TokenTextSplitter

splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
    splitter=splitter,
    embeddings=embeddings_model,
    extractor=keyphrase_extractor,
    run_config=run_config,
)

This chunking mechanism, combined with the evolutionary generation paradigm, specific question generation process, and detailed document extraction and splitting process, ensures that even if the document is long, the answers can be retrieved from different chunks, ensuring completeness. The AbstractQA class and SimpleTestGenerator class use a combination of prompts, critic mechanisms, and relationship-building techniques to generate and validate questions and answers, ensuring that the answers are comprehensive and derived from multiple relevant sections of the document ^[1]^[2]^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

explodinggradients / ragas

How is it ensured that ground truth answers are complete and not just partially correct? #1218