Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.33k stars 599 forks source link

Bypassing Some Pre-processing and Validation Steps in Pipeline for Faster Document Ingestion #408

Open markokow opened 1 month ago

markokow commented 1 month ago

I’ve been using a document that isn't from a scientific journal. When I was on version 4.9.0, the prompt response was quick, but after transitioning to 5.2.0, it took much longer to get an answer after ingesting the document.

It seems this delay is due to additional validation and processing steps in the pipeline, such as:

Given these issues, is there a way to bypass these steps in the pipeline? I'm using the Docs object since I need to process the response in python for further processing

Here is my code snippet

docs = Docs()

for doc in doc_paths:
    docs.add(doc)

settings = Settings()

answer = docs.query(
    "suggest a good credit card",
    settings=settings,
)

print(answer.formatted_answer)

Thank you!

image

jamesbraza commented 1 month ago

Oh interesting, thanks for the feedback and sorry for the performance regression.

I think when you call Docs.aadd you will want to override the default the clients. This is the logic you want to tap into: https://github.com/Future-House/paper-qa/blob/v5.0.3/paperqa/docs.py#L330-L336