-
Hello, I wondered what is recommended way to use local markdown files with paperqa. Looking at [readers.py](https://github.com/Future-House/paper-qa/blob/HEAD/paperqa/readers.py#L287) it seems markdow…
-
@reynoldsm88
Any double carriage return is going to introduce a sentence break during information extraction. So any time a double carriage return in is in the middle of a sentence, that's quite de…
-
### Why is it worth to add this package?
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
### Home page URL
…
-
I often work with vector .pdf images. They contain essentially perfect representations of the data, but can be difficult to work with.
Given the integration with pdfjs, it would be interesting as …
-
Hi,
I use `pdfWriter = muhammara.createWriterToModify(localPdfPath,{modifiedFilePath:destPdfPath});` to create my pdfWriter so I can read and add an annotations. It worked perfectly until now, when…
-
Whenever we send a PDF for extraction it seems to take the whole system down for a while. This is using the basic scenario PDF [found here](https://github.com/DARPA-ASKEM/knowledge-middleware/tree/mai…
-
chf_sufia displayed a "page count" for PDF original downloads.
But our current app is not extracting "page count" from PDFs.
While it's relatively easy to do that with shrine, the way we are d…
-
Is this due to the recent obsidian update? I'm not sure. But I hope you will fix this soon enough. Thank you.
-
This may be a problem with tesseract, or a setting that can be applied when creating the instance to ocr as an option -- not sure if that is even the best place to address the issue to be honest. I fo…
-
### Environment
node v20.11.1
unpdf v0.11.0
### Reproduction
I got the original error in a server route of a Nuxt 3 project. Also, in the original app I performed other operations besides text/met…