Michael-F-Bryan / mdbook-epub

An experimental mdbook backend for creating EPUB documents.
https://michael-f-bryan.github.io/mdbook-epub/
Mozilla Public License 2.0
370 stars 46 forks source link

[BUG] mdbook-epub builds malformed zip archive in specific configurations #99

Closed epifirumu closed 5 months ago

epifirumu commented 5 months ago

Hello, I've encountered an issue with mdbook-epub where duplicate files are being added to the EPUB zip archive.

Description

When generating an EPUB, if the cover image which defined in book.toml was also referenced by markdown to be display, mdbook-epub will add the same file to the EPUB archive twice. As we all know, files in the same directory should not have the same name. However, both epub-builder and zip-rs did not prevent the generation of such malformed zip files.

Steps to Reproduce

I've created a minimal reproducible example at epifirumu/mdbook-epub-example which generates EPUB file comes with the issue using Github Actions following these steps:

  1. Create a book with mdbook and include a cover image in the book.toml configuration (e.g., cover-image = "assets/cover.png").
  2. Write a simple markdown chapter that includes the same cover image like this using ![Cover](assets/cover.png).
  3. Build the EPUB version of the book with mdbook-epub.
  4. Inspect the generated EPUB file and locate the image file under OEBPS directory.
  5. You will find multiple entries of same filename for the cover image file. 7zip

Impact

Some EPUB readers are unable to parse the EPUB generated by mdbook-epub properly, especially those relying on java.util.zip.ZipFile on Android (e.g., Lithium and Neat Reader), will be unable to parse the EPUB file correctly due to the duplicate file entries. debug

Additional Context

This seems to occur because the processes for adding assets from markdown code and adding the cover image to the zip archive are separate, and the HashMap used in assets' de-duplication is not shared between these procedures. The relevant code might be located in mdbook-epub/src/resources/resource.rs and mdbook-epub/src/generator.rs. (Please note that I'm a novice in Rust, so my analysis might not be entirely accurate.)

Related issues: A solution provided by a this comment in issue #28 which re-zipping the EPUB archive to solve "protected by Adobe DRM" error on Kobo eReader also works for solving this issue. A issue from transky-book/transky#42 in Chinese initially reported the problem of duplicated cover files in EPUB.

Environment

Thanks for reading this. If there are any other additional steps needed on my part, please don't hesitate to inform me.

blandger commented 5 months ago

I looked into your example, the resource are duplicated because you have it in two places. Your assumptions are quite correct.

1, First place is ! [ Cover ] ( assets/cover.png ) in MD file.

  1. Second place is cover-image = "assets/cover.png"

When book is processed those resources are processed 'independently', because in general those are completly different images. A first resource is added into epub as 'image', the 'cover' is added on later step as separate resource.

......However, both epub-builder and zip-rs did not prevent the generation of such malformed zip files. Yes, that is correct, those libs are used internally by mdbook-epub and they do not check that.

I'm not sure why do you put 'cover' into first MD page (#1). If you want to have a cover as 'thumbnail' you should have it as completely separate 'cover image file' that will be included as separate resource and will be displayed in epub correctly. There is no sence to put it on the first MD page file.

.....This seems to occur because the processes for adding assets from markdown code and adding the cover image to the zip archive are separate, and the HashMap used in assets' de-duplication is not shared between these procedures.

I suppose that is correct behavior.

So solution is create another image as 'cover' OR do not put it into any MD file.

epifirumu commented 5 months ago

I'm not sure why do you put 'cover' into first MD page #1. If you want to have a cover as 'thumbnail' you should have it as completely separate 'cover image file' that will be included as separate resource and will be displayed in epub correctly. There is no sence to put it on the first MD page file

I realize my initial example might have come across as somewhat unconventional. To clarify, the example I've provided is just a minimal bug reproduction for troubleshooting and it was stripped of complexities to demonstrate the issue with as few variables as possible to focus on the problem itself. Placing the cover image on the first page of the book may looks no sense indeed since a real book doesn't tend to be less than a page long.

In practical scenarios like the novel we are maintaining as an open-source project, the cover image doubles as a piece of character art within an illustration collection that accompanies the novel.

Given that this novel is an open-source and freely available piece rather than a commercial publication, the image is designed to be both a cover and a standalone piece of art which doesn't include branding, publication details and other elements that might detract from the visual appeal like commercial books. Consequently, we have reused the same image for the EPUB cover and within the illustration collection.

Thanks for your guidance on this matter. We've currently implemented the workaround you suggested by creating an copy for the cover and refraining from including it in any markdown file.