edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.8k stars 154 forks source link

Win10 (not MacOS) PDF import error of "no such path in zip: cover.png" for many PDFs - example file linked to. #1939

Closed byjosh closed 1 year ago

byjosh commented 1 year ago

Can't see an open issue or closed issue that is this - though #1335 may relate to underlying cause. Hope I have given necessary detail to replicate.

With the PDF at https://www.vmware.com/content/dam/learn/en/cloud/pdf/Managing_Kubernetes_eBook.pdf with a sha256 hash of b4cc83bd201640ef625ac261021d9dc29b3c8992793fe9e454d2ba4e1fe578f9 - saved in an ordinary location of C:\Users\cis\Downloads on Windows 10 (Pro and Home versions) I get the import of the PDF into Thorium failing with the message of "no such path in zip: cover.png".

This actually happens with quite a few other PDFs (often O'Reilly developer publications) - hence filing a bug report - the example file is just one that is publicly available. Firefox builtin PDF reader will show the cover loading fine (example file) or not (some other files).

This error is with Thorium version 2.2.0.0 Windows Store version and Github installer version running on Windows 10 version 22H2 build 19045.2728, Windows Feature Experience Pack 120.2212.4190.0 . That version and build number is for a Win 10 Pro machine I initially tried on but using Windows Store version of Thorium I replicated on a Win 10 Home machine with same Windows version (22H2) and build number. So it is not just the particular machine I first tried this on.

On the PDFs where Firefox's reader does not load the cover fully - but does read the rest of the PDF (indicating the issue may be an ungraceful failure) - Microsoft Edge is able to read the cover OK (the O'Reilly Programming Rust title is an example of a file I saw this behaviour with). I see from #1335 there are various PDF.js versions involved - in Firefox (v 111.0.1) this does not result in the tab closing and the rest of the PDF is viewable and readable even if the cover does not show correctly on some files - in Thorium the window closes and the import fails (so to read these PDFs Firefox or Edge would be required on Windows 10). So Thorium & PDF.js is issue not just PDF.js in any setting.

The error does not occur on MacOS Big Sur using the Thorium Github Mac installer (v.2.2.0) - and sha256 hash is the same for both copies of the file downloaded (my first thought was a corrupted download but the hash values being the same I think means that can be ruled out).

As the file name and location is very standard and does not contain characters one would not find on a US/UK keyboard I doubt this is an issue with the path or filename or a failure to cope with non-English characters.

I hope with the example file & hash value, the Thorium and Windows version info and having verified it on more than 1 machine I have provided enough info to make this a replicable bug.

danielweck commented 1 year ago

Hello, thank you very much for your detailed description of the problem. Could you please download and try the "latest windows" build of Thorium: https://github.com/edrlab/thorium-reader/releases

byjosh commented 1 year ago

Hi, I installed the 2.3 alpha from build run 4499631425 and that displays the same behaviour: opens but then closes the sample PDF (so that one cannot actually read it in Thorium unlike in Firefox or Edge) and fails to import it - with "no such path in zip: cover.png" message. That was under Win 10 Home (Win 10 version/build as above).

So current 2.3 branch does not seem to fix it. My observation is that the example file maybe is considered malformed by some standard (Thorium loading other files - ebooks that seem to have covers - just fine). So a combination of PDF structure knowledge, Thorium debugging knowledge (which I don't have) and PDF.js debugging (likewise I don't have) might reveal how to fail less catastrophically/more gracefully with this kind of file.

The one thing I can do is: having a copy of Acrobat Professional I think that shows me a bit of the document structure - I'll examine the test file and another one that imports fine and see if I spot any obvious difference (e.g. if it is specifies the cover image differently in the problem file vs functioning file).

byjosh commented 1 year ago

So comparing the reference, problem, file with https://www.nginx.com/wp-content/uploads/2017/07/Complete-NGINX-Cookbook-2019.pdf - which loads fine what I am seeing in Acrobat Professional (View > Navigation Panels > Content) is that the problem file has a simple first page composed of Annotations, Path and two XObject Images (of identical dimensions and unenclosed in any other containter) - whereas the functioning comparison file the NGINX Cookbook has a much more complex first page - and the images are enclosed in a Container <Figure> XObject: Image w: 1809 h:1022 - with the XObject Image inside that (whereas in the problem file it is at the top level).

However looking at O'Reilly's Programming Rust (by Blandy & Orendorff) where the crab image on the cover only loads a 6mm horizontal strip in Firefox and the import fails in Thorium - that has the images wrapped in Container <Figure> XObjects.

So it is not that the XImage Objects that are not enclosed in Container objects are the single cause of the issue. At this point I do not know enough to help further and would hope that debugging Thorium would give more informative error messages about what code is failing and what that code requires from the PDF file - yet is not getting in the case of the problem file.

danielweck commented 1 year ago

Hello, could you please try again with the latest automated test build:

https://github.com/edrlab/thorium-reader/releases

danielweck commented 1 year ago

Hello, I am unable to reproduce this behaviour with Thorium v2.3.0 on Windows 11 (see attached screenshots). I am closing this issue, but please feel free to chime back in for further feedback. Thank you!

https://www.vmware.com/content/dam/learn/en/cloud/pdf/Managing_Kubernetes_eBook.pdf

Screenshot 2023-08-03 180959 Screenshot 2023-08-03 181026 Screenshot 2023-08-03 181052

byjosh commented 1 year ago

Hi I did test the build you indicated on my Windows 10 setup without success - but if you previously had an error on Windows 11 and with the new build the error has gone then that is some progress.