Conversion issue: PDF to CBZ

Sn1cket commented 1 year ago

Hello,

I have a couple of comics in PDF format that I would like to convert to CBZ. I was able to convert most of my files but a few won't convert with the following error message:

Converting
D:\comic.pdf
Extracting Pages...
Error: Requesting object that isn't resolved yet img_p0_2.
Couldn't convert the file, an error ocurred

Any clues what the problem could be?

binarynonsense commented 1 year ago

From the error message, it seems like the pdf library I use isn't able to find the embedded data for one of the images/pages and can't render it, it may be missing or it has a bug. Right now I cancel the conversion/extraction process if there's any error, If I can't fix it I could just skip pages with problems, but that could be confusing for the user if I just print a warning or something like that and they don't notice and think the conversion was completely fine... although I could add an option to explicitly allow it. I understand if you can't, but could you share a file I could use to test possible solutions? I don't know when/if I'll be able to work on the program but if I have the chance in the future it would be useful.

Sn1cket commented 1 year ago

Yes, I could share a file with you. Is there a way to share this file privately with you on GitHub?

binarynonsense commented 1 year ago

Not that I know of.. You could send me an email if you want, you can find my contact email on my site www.binarynonsense.com, or send me a DM on twitter (the link is also on my site), with a link to mega/dropbox/google drive... whatever is easier for you.

Sn1cket commented 1 year ago

I sent you an email.

binarynonsense commented 1 year ago

Thanks, I got it and was able to replicate the bug. As I said, I don't know when/if I'll have the time to work on it, due to some things going on on my life right now, but if things improve and I manage to in the future this will be very helpful. Thanks for taking the time to report the bug!

Sn1cket commented 1 year ago

Yeah, sure. No problem.

Kered commented 1 year ago

Hi there @binarynonsense! In case additional examples help for fixing the problem, this same error is happening when trying to convert the PDFs in the latest comic Humble Bundle (featuring Elfquest). I see someone recommended ACBR there and also noted the error. The PDF viewer in ACBR seems to handle the files fine, it's just the PDF to CBR converter having an error.

binarynonsense commented 1 year ago

Thanks for letting me know the bug is happening in more cases! I probably won't be able to work on this for now but if/when I can I'll try to make the samples Sn1cket sent me work, as I'm not planning on getting that particular bundle but the cause is probably related. I haven't bought any in a while for budget reasons but originally the main motivation I had to add the conversion tools to my reader was to convert/resize some comics from humble bundles I had bought (and most of the files I use to test new versions are from them), so I'm a bit sad it's now failing with files from recent ones and I'd really want to work on a fix for this (and some other improvements I had planned)... hopefully I'll be able to do it some day in the near future but can't promise anything for sure as things are a bit rough right now and have problems finding the time.

Kered commented 1 year ago

No problem, I completely understand! If it helps any for when you have time to look into this, I'm happy to gift you the $1 tier of this bundle so that you have a pair of extra examples to work with., if you're comfortable DM-ing me an email address.

Another thing that might help you out - I discovered that forcing the DPI settings for PDF extraction to something other than using the embedded image's DPI seems to get around the error. Maybe these PDFs with errors have some kind of weird DPI setting, or maybe are somehow missing it altogether?

binarynonsense commented 1 year ago

This was bothering me too much so I made some time tonight (things were quiet and sleep is overrated ;)) to take a look at it and I think I found a solution. The bug seems to go away with the newest version of the pdf library I'm using to render the pages, but it seems to be slower... so I decided to keep using the older version for a first pass but keep track of the failed pages and try converting them using the newer version in a second pass if needed. Seems to work with the couple files I have that triggered the bug before, so I've uploaded a new beta version I've quickly put together to test it more thoroughly. @Sn1cket and @Kered, if you have the time, please try your files with this beta and let me know if it now works.

Kered commented 1 year ago

I just did some basic testing with your new beta version, and I was able to convert the two problematic PDFs to CBRs!

At first, I did briefly experience some weirdness where opening a PDF kept crashing ACBR, and I couldn't drag and drop files onto an open ACBR window to view that file. But then I couldn't recreate that problem again, so I'm thinking it might have been something to do with having two versions of ACBR on my PC, or me mixing up which version I was running,? Or (maybe?) the new version compiling some library code behind the scenes upon first use? Whatever the case, it doesn't seem to be happening now, so if you and @Sn1cket don't have the same problem, then I wouldn't worry about it.

Note: I'll be away most of this coming week, so if you do anything else and need additional testing, it'll be awhile before I can do more. Thanks for looking into this so quickly, and sorry it interrupted your sleep! :)

binarynonsense commented 1 year ago

Thanks for testing it so soon!

I'm glad the conversion seems to work now with the files that gave you trouble.

I have no idea what could have caused the initial crashing/weirdness, as I've only changed code related to the conversion and extraction tools, the reader should behave exactly the same and using this version and the previous one in the same PC should be totally fine (they even are practically the same, config file format, libraries used...) and I couldn't reproduce it in any of my computers (although only one using Windows so that's not that big of a sample size to rule out anything if that's the OS you are using). Seeing that it fixed itself later I'll try not to worry too much about it for now :)

I don't have the time to make and test a proper release for the moment (and I'd want to add other things in my TODO list for the next full version) but, in the meantime, hopefully this beta will help people having similar troubles with some pdf files (if that's the case please let me know here if the beta fixes them).

Sn1cket commented 1 year ago

Thanks @binarynonsense for your effort and even sacrifice your sleep.

I tested the changes on my side and I have a good and a bad news. The three files I initialy had problems with do convert without problems now but in the meantime I bought another Humble Bundle and one of the Comics of the Bundle still won't convert. Maybe the library you use will fix this in a future release as well.

binarynonsense commented 1 year ago

No problem, thanks for taking the time to test it.

That's disappointing... Does it output the same error as before? And does the file work ok in the reader itself (all pages rendering correctly)? Could you send me a test file that triggers the error so I can try figuring out if there's something I can do? I updated the pdf library I'm using to its final version so that won't help anymore for now, but maybe the way I'm using it to generate the pages in the converter may be tweaked somehow to avoid the issue without having to wait/hope for a future fix in the library.

binarynonsense commented 1 year ago

By the way, if you choose 300dpi for the "PDF Extraction" option in the Advanced options of the converter does the conversion work? I'm assuming it would as, if the problem is similar to the previous ones, it doesn't happen in the reader itself and trying that worked for @Kered's file. If that's the case I could generate the pages that still give errors using 300dpi instead of failing the conversion if I don't find a better solution, in fact the default option is named "use the embedded image's info if possible, 300dpi if not", so that's probably what I should be doing already if I followed my own description :) (I just didn't think about failing to get the info only for some of the pages). Do you think that could be a good solution/compromise? Trying to use the embedded info is something I added in the latest versions, because it sounded like the best way to get the closest render of the original images without relying on the dpi info of the pdf file itself as it seems to be not always correct or well defined, and to achieve it I do some things with the library that may not be its traditional use case :)

Sn1cket commented 1 year ago

Does it output the same error as before?

Similar error with different page. Error: Requesting object that isn't resolved yet g_d0_img_p37_1.

And does the file work ok in the reader itself (all pages rendering correctly)?

Yes, the reader has no problems.

Could you send me a test file that triggers the error so I can try figuring out if there's something I can do?

Do you still have the link I shared with you? You can now find the problematic file under this link.

By the way, if you choose 300dpi for the "PDF Extraction" option in the Advanced options of the converter does the conversion work?

Yes, this works.

If that's the case I could generate the pages that still give errors using 300dpi instead of failing the conversion if I don't find a better solution, in fact the default option is named "use the embedded image's info if possible, 300dpi if not", so that's probably what I should be doing already if I followed my own description :) (I just didn't think about failing to get the info only for some of the pages). Do you think that could be a good solution/compromise?

That sounds like a great idea.

binarynonsense commented 1 year ago

Thanks! I got the test file and was able to reproduce the bug.

Ok, seems like we have a plan. When I have the time to work on it I'll try to see if I find a way to fix it but, at the very least, I'll fallback to the potentially less accurate 300dpi method when finding a page I'm unable to extract the embedded info from.

I'll let you know when I have a new beta to test.

binarynonsense commented 1 year ago

Sorry it took me so long, but I've uploaded a new beta with the fallback solution I proposed previously. I haven't had the time to try finding a different way to use the library without triggering the bug but falling back to the 300dpi method in the rare cases it happens seems like a good compromise. I don't know when I'll be able to work on the next big version (2.5), but if this works well I'll try to make a new stable release (2.4.2) with just this update whenever I can, so people downloading the program don't experience the bug.

Kered commented 1 year ago

I've tried a handful of PDFs (including the original ones giving me a problem) and CBRs files, and they all converted fine!

Incidentally, there's some really strange PDF files in that Elfquest Humble Bundle I mentioned. Not only did most of the volumes in it trigger the original problem mentioned above, but Dark Horse provided some really HUGE PDF files for volumes 4 and 5 in that bundle (somewhere around 4-5 GB in size and having way higher DPI than is necessary). ACBR couldn't even open them without choking, though to be fair, neither could Microsoft Edge.

I took incredible pains to get those two volumes converted to CBR files in ACBR... I first tried every application on my Win 10 PC that could handle PDFs, found none that could both open those files and also flatten them to a new PDF file that ACBR could open. I eventually came across IrfanView (a more general purpose image viewer), installed its PDF plugins, and found that it could make a somewhat smaller PDF file of the two huge Elfquest volumes. I then was able to convert those two new files I created into ACBR while scaling them down further to create more appropriately sized CBR files.

I've since deleted those two original really big PDFs from my system, so I can't test them now, but I imagine they would still have a problem with this new beta version of ACBR. I don't really see that as being an issue with ACBR, though, as Dark Horse created them in a really non-standard manner.

binarynonsense commented 1 year ago

4 GB is really huge! I originally had the idea to add the conversion tools in ACBR after having to use command line tools to resize some big files from humble bundles, cbzs in that case, but I don't remember any so big, seems quite excessive :) To be honest, I've never tested it with such a big pdf, but doesn't surprise me it struggles/fails with them, the pdf library seems to get slower the bigger the file is. If you didn't find any conversion problems with 'normal' or reasonably big pdfs I think I'm happy enough to settle for this solution. As soon as I can I'll try to make a new stable version with it. I'll leave the issue open a bit more just in case and then close it, but if you find any similar problem feel free to reopen it or open a new one. Thanks for the help!

binarynonsense / comic-book-reader

Conversion issue: PDF to CBZ #21