-
**Describe the bug**
Does not work at all because it complaints about deprecated pypdf classes.
**To Reproduce**
* Create a new virtualenv
* Install
* Try to run it
**Expected behavior**…
-
Add table extraction benchmark.
-
**new issue created about the problem of extra \x00 due to mix up between utf8 and 8bit only charsets**
@orgmast5 created:
> I'm trying to fill a form, I installed the latest build fr…
-
**Context**: I'm trying to replace my uses of `pdfrw` by `pypdf` in various scripts that I have,
and `pypdf` is still revealing to be quite slower...
In some case, I'm witnessing a factor x5 speed…
-
```
Stylo needs support for extracting text from PDF files.
Suggested library:
PyPDF - http://pybrary.net/pyPdf/
```
Original issue reported on code.google.com by `Matthew.Tornetta@gmail.com` on 16 …
-
While using the `v3.0.79` It seems that some pdfs are not currently parsed well when using `from pypdf import PdfReader` in `backend/danswer/file_processing/extract_file_text.py`.
The result is tha…
-
### Bug Description
Hey, so to sum it up, I create a SimpleDirectoryReader with a PDFReader as an extractor and an s3 bucket as an input_dir, with also s3 as a fs.
ThenI call load_data() which leads…
-
It's possible I messed up the install somehow, but after running the setup.py script and trying to run 'inkscapeslide' on an SVG configured for inkscapelide already, I got this:
```
Traceback (most r…
-
I need to extract text from a PDF document using the `page.extract_text` function, but all the extracted Chinese characters are garbled. I suspect that this PDF document uses several special Chinese f…
-
**Feature request**
Thanks for your suggestion on improving pdfminer.six. To helps us discuss and
implement this request, please make sure to include the following information:
- There are a fe…