Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.6k stars 194 forks source link

pdf to tiff converting only first page #206

Closed frederick0291 closed 3 years ago

frederick0291 commented 3 years ago

Describe the bug Converting pdf to TIFF only converts first page. I tried adding first and last pages, I also tried using even or odd pages but still only the first page gets converted.

To Reproduce Steps to reproduce the behavior: On python: convert_from_path(file, output_folder=outdir, poppler_path = poppler_path, fmt="tiff", output_file=filename, grayscale=True) On cmd: run pdftocairo -tiff sample.pdf .

Expected behavior Convert all pdfpages to tiff.

Desktop (please complete the following information):

Additional context I noticed an error "No display font for ArialUnicode" and I though it was the root cause. I tried to install Arial Unicode MS but issue still remained. I resolved the error by embedding the fonts on Acrobat Pro but pdf2cairo still converts one page. Converting PDF to other formats like png or ppm successfully converts all pages.

sample.pdf

arethis commented 3 years ago

I'm having the same issue, same OS / software versions listed above.

Belval commented 3 years ago

To clarify, you were able to reproduce this issue when using pdftocairo directly? If so this needs to be addressed but you should probably open an issue with the maintainers of poppler. pdf2image is only a wrapper around that library.

Open an issue here: https://gitlab.freedesktop.org/poppler/poppler

However in the meantime I can add a warning or even an exception to make sure that people don't accidentally use the broken feature.

arethis commented 3 years ago

I also had the thought that perhaps it's an issue with pdftocairo, after getting a better understanding of how pdf2image works, and that is indeed the case, the same issue is produced when converting a PDF to a TIFF file by running pdftocairo directly.

To provide some additional information to anyone else encountering this issue: It appears to be an issue specific to Windows 10, or the version of Poppler that is currently available on Windows, 21.03 while the current version of Poppler is 21.08. The current version of Poppler works correctly on a base system install on Arch Linux (Release 2021.08.01)

The respective projects used to create the Windows binaries for Poppler (see below) are behind by several versions, and it's unclear where these projects are at in regards to being up to date with the current version of Poppler.

Poppler Windows Repos: https://github.com/oschwartz10612/poppler-windows https://github.com/conda-forge/poppler-feedstock

frederick0291 commented 3 years ago

Opened an issue in gitlab: https://gitlab.freedesktop.org/poppler/poppler/-/issues/1112

As per the maintainer, pdftocairo is working for him. This might be an issue in the version (21.03) we are using?

Belval commented 3 years ago

No luck, it seems like the feedstock is stuck on some odd compilation issue, will update the issue as we get more information.

frederick0291 commented 3 years ago

Hi everyone.

I downloaded the latest windows release: poppler-21.08.0 from https://github.com/oschwartz10612/poppler-windows/releases/tag/v21.08.0 and tried testing again with the pdftocairo. Still converts only the first page. Raised the issue on the main poppler repo in gitlab and they said that it was working properly on their end.

Tried adding first and last page with -f and -l options. Tried using -e and -o options. All of the above converted only the first page of the series.

I will try using the one in gitlab. I am not sure where the issue is coming from. It might be silently failing after the first page.

frederick0291 commented 3 years ago

Let me know if I need to close this issue as it is an issue coming from the poppler-windows version.

Belval commented 3 years ago

I think it's ok to leave it open for now as we don't have a proper resolution and I would like to keep track of the issue. Make sure to report back if you find a solution.

frederick0291 commented 3 years ago

Hi @Belval I think the maintainers of poppler-feedstock found the cause of this issue. I will be requesting an update from poppler-windows to see if it fixes problem. https://github.com/conda-forge/poppler-feedstock/issues/111

frederick0291 commented 3 years ago

Confirming that the issue has been fixed with the new update on poppler-windows pdftocairo now converts all pages to tiff. Thanks!