-
## Objective
create own tesseract model using pytesseract to improve extraction from pdf files. Compair results with basic extraction using pymudf or pypdf2
## Key Features
- [ ] own model is t…
-
Since some versions of PyPDF2, the pdf documents that I split and regenerate are loosing PDF/A confirmation (checked with https://avepdf.com/pdfa-validation). Those documents are not accepted by certa…
-
PyPDF2 is now forcing use of `PdfWriter` class instead of `PdfFileWriter` so the library is broken.
Would you please update the code ?
In the meantime, I'm trying to force `PyPDF2==2.12.1` in my…
-
The PDF file is attached
[pdf_sample_googlesheet_pages_02.pdf](https://github.com/claird/PyPDF4/files/4984802/pdf_sample_googlesheet_pages_02.pdf)
traceback:
File "/usr/local/lib/py…
-
I keep getting this error:
DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
I did the --upgrade
Thank you
-
I sometimes run in to the issue of PyPDF2 not working with certain pdfs but pdftotext does. Is there any plans/solutions to have this library run pdftotext instead or as an option?
-
currently the data are copied manually into [the data file](./data/data_corona_varianten.txt). This should be automated.
Current options:
- camelot (currently [with runtime error](https://github.com…
-
I am porting my script from python27 to python33.
When I run the code `pdf = PdfFileReader(open('xxxx.pdf', 'rb'))`, the error message appears:
```
Traceback (most recent call last):
File ..., in raw…
-
```markdown
# Title
- [chapter 1](chapter1.html)
- [chapter 2](chapter2.html)
```
Convert the HTML to PDFs and merge accordingly.
-