Open camipozas opened 2 years ago
Thank you for taking the time to fill the issue template, it's much easier to help.
Is this only with one or a few PDFs?
Also, can you run pdftoppm -r 200 -jpeg your_file.pdf out
and see if that also gives you an error?
Hello, I was doing analysis of the pdfs that gave me an error and they all had docusign, but it also happens that others with docusing usually run correctly. I don't know how to upgrade poppler-utils in docker. I'd read this before, Pdf2Image library failing to read pdf signed using docusign
Hello, I solved the mistake. The solution is create an ubuntu image, then install python (my case) and then install my things. It's the only way for now... When I get inside the container I saw this version of poppler:
poppler-utils:
Installed: 20.09.0-3.1
Candidate: 20.09.0-3.1
Version table:
* 20.09.0-3.1 500
500 http://deb.debian.org/debian bullseye/main amd64 Packages
100 /var/lib/dpkg/status
And I know that I need +21.03.00...so after doing the solution, the image have:
poppler-utils:
Installed: 22.02.0-2
Candidate: 22.02.0-2
Version table:
* 22.02.0-2 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
100 /var/lib/dpkg/status
If anyone has a question please contact me, happy to help.
What I still don't understand what cause miscount?
@faltunik sorry I don't know what cause the issue in details..I only know a priori the cause and the solution
This is a poppler issue unfortunately so there is not much that can be done on my side. I might add a check that raises a warning so that people are aware.
if you want can I add the documentation to your project. I can make a fork and then upload the PR.
I appreciate the offer, but I am not sure what's the best way/place to document this yet.
It could be:
For the code warning it would using the warning module (https://docs.python.org/3/library/warnings.html#warnings.warn):
warnings.warn(f"Detected popper version {poppler_version_major}.{poppler_version_minor} is known to fail on some PDFs in rare cases")
Code warning is more intrusive and might be overkill depending on how common this issue is.
@camipozas How do you check whether a particular pdf is a scanned pdf or not?
Describe the bug I am running an image in Docker to read a pdf, convert it to image and later to text (there are scanned documents) and I get the following error, does anyone know why? I can't share the document :(
To Reproduce
Desktop (please complete the following information):
Additional context Dockerfile