Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.17k stars 581 forks source link

Pinpoint raw evidence chunks #464

Open athena-yao opened 3 weeks ago

athena-yao commented 3 weeks ago

I want to be able to pinpoint the exact sentence(s) in a document where data was extracted from so that I can cross-check for accuracy. answer.context() gives summarized chunks (after they've been fed into the summary LLM) but not the raw data. I've also tried implementing await docs.aget_evidence and gather_evidence without success. How can I get the raw evidence chunks associated with answers?

dosubot[bot] commented 3 weeks ago

Hey @athena-yao! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

To get the raw evidence chunks associated with answers in a document, you can use the aget_evidence method directly. This method retrieves the relevant texts based on the query without summarizing them. Here is an example of how you can use it:

async def get_raw_evidence(docs: Docs, query: str, settings: Settings, embedding_model: EmbeddingModel, summary_llm_model: LiteLLMModel) -> list[Text]:
    state = EnvironmentState(docs=docs, answer=Answer(question=query))
    gather_evidence_tool = GatherEvidence(settings=settings, embedding_model=embedding_model, summary_llm_model=summary_llm_model)
    await gather_evidence_tool.gather_evidence(query, state)
    return [context.context for context in state.answer.contexts]

# Usage
docs = Docs()  # Initialize your Docs object
query = "Your question here"
settings = Settings()  # Initialize your Settings object
embedding_model = EmbeddingModel()  # Initialize your EmbeddingModel object
summary_llm_model = LiteLLMModel()  # Initialize your LiteLLMModel object
raw_evidence_chunks = await get_raw_evidence(docs, query, settings, embedding_model, summary_llm_model)

This will give you the raw evidence chunks without any summarization [1][2].

To continue talking to Dosu, mention @dosu.