Open Snikch63200 opened 1 month ago
To disable document checking and prevent the error, set the disable_doc_valid_check
parameter to True
in your configuration. Here's how you can do it:
from paperqa.settings import ParsingSettings
# Create a ParsingSettings instance with document validity check disabled
parsing_settings = ParsingSettings(disable_doc_valid_check=True)
This will bypass the document validity check, allowing documents with images to be processed without raising an error [1].
To continue talking to Dosu, mention @dosu.
@dosu
the error seems to persist even if
parsing_settings = ParsingSettings(disable_doc_valid_check=True)
is passed.
Hi @DGoettlich - you need to pass the setting:
from paperqa import Settings, Docs
settings = Settings()
settings.parsing.disable_doc_valid_check = True
d = Docs()
d.add(...., settings=settings)
Ensure that the ParsingSettings
instance with disable_doc_valid_check=True
is correctly passed to the function or module responsible for processing the document. The aadd
function in the Docs
class is responsible for document validation and will ignore the check if this setting is properly applied [1]. Double-check that the configuration is being used where needed.
To continue talking to Dosu, mention @dosu.
Hello,
Sometimes, documents that contains images are not recognized as text documents. For this issue, paperqa recommands to disable document checking.
Could not read Auchy les mines (62) - Haisnes-lez-la Bassée.pdf This does not look like a text document: ./Papers/Auchy les mines (62) - Haisnes-lez-la Bassée.pdf. Pass disable_check to ignore this error.
How can I disable document checking to prevent this error ?
Best regards.