Bypassing Some Pre-processing and Validation Steps in Pipeline for Faster Document Ingestion

I’ve been using a document that isn't from a scientific journal. When I was on version 4.9.0, the prompt response was quick, but after transitioning to 5.2.0, it took much longer to get an answer after ingesting the document.

It seems this delay is due to additional validation and processing steps in the pipeline, such as:

Checking environment variables like SEMANTIC_SCHOLAR_API_KEY and CROSSREF_API_KEY
Retrieving metadata for SemanticScholarProvider and CrossrefProvider
The most significant delay was caused by the "Failed to generate bibtex" error.

Given these issues, is there a way to bypass these steps in the pipeline? I'm using the Docs object since I need to process the response in python for further processing

Here is my code snippet

docs = Docs()

for doc in doc_paths:
    docs.add(doc)

settings = Settings()

answer = docs.query(
    "suggest a good credit card",
    settings=settings,
)

print(answer.formatted_answer)

Thank you!

Future-House / paper-qa

Bypassing Some Pre-processing and Validation Steps in Pipeline for Faster Document Ingestion #408