fire-eggs / DarkThumbs

Adds thumbnail preview to Windows Explorer for EPUB, FB2, DJVU and Kindle ebooks; 7Z, CB7, CBZ, CBR, RAR and ZIP archives.
451 stars 23 forks source link

Not the right cover for some epub #9

Closed s-kocher closed 3 years ago

s-kocher commented 3 years ago

Hello, thanks to the last 1.1 fix with epub, I can view the result without crash :)

I see some epub didn't have the right cover, in comparison Calibre / SumatraPDF choose the right one.

Case 1

I see the content.opf file for these epub with bad cover picked by DarkThumb don't have any cover information. After picking the right image with Calibre Editor, following lines are added in the content.opf file and it's ok for DarkThumbs :

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
  <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:publisher>...</dc:publisher>
    <dc:language>en</dc:language>
    <dc:creator opf:file-as="Неизв." opf:role="aut">xxx</dc:creator>
    <meta name="calibre:timestamp" content="2021-03-03T20:51:55.968000+00:00"/>
    <dc:title>xxxxx</dc:title>
    <dc:date>2021-03-02T21:00:00+00:00</dc:date>
    <dc:contributor opf:role="bkp">calibre (1.5.0) [http://calibre-ebook.com]</dc:contributor>
    <dc:identifier id="uuid_id" opf:scheme="uuid">49953084-6f87-4f00-8763-7cffe4159ad5</dc:identifier>

    <!-- Added after marking the right image with calibre editor as cover -->
    <meta name="cover" content="cover-image"/>
    <!-- Added after marking the right image with calibre editor as cover -->

  </metadata>
  <manifest>
    <item href="EPUB/images/9781284157468_APP_TAB02-01.png" id="itabap-2-1" media-type="image/png"/>
    <!-- ... -->
    <item href="EPUB/images/9781284157468_EQU26-1.png" id="i9781284157468_EQU26-1" media-type="image/png"/>
    <item href="EPUB/images/9781284157468_FC.jpg" id="cover-image" media-type="image/jpeg"/>
    <!-- ... -->
    <item href="EPUB/images/pub.jpg" id="iJBLRN_LOGO_Colo1cK" media-type="image/jpeg"/>
    <item href="EPUB/xhtml/01_Titlepage.xhtml" id="i01_Titlepage" media-type="application/xhtml+xml"/>
    <!-- ... -->
    <item href="EPUB/xhtml/12_Chapter01_01.xhtml" id="i12_Chapter01_01" media-type="application/xhtml+xml"/>
    <!-- ... -->
    <item href="EPUB/xhtml/46_Index_split_001.xhtml" id="i46_Index9" media-type="application/xhtml+xml"/>
    <!-- ... -->
    <item href="EPUB/xhtml/cover.xhtml" id="icover" media-type="application/xhtml+xml"/>
    <item href="page_styles.css" id="page_css" media-type="text/css"/>
    <item href="stylesheet.css" id="css" media-type="text/css"/>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="icover"/>
    <itemref idref="i01_Titlepage"/>
    <itemref idref="i12_Chapter01_02"/>
    <!-- ... -->
    <itemref idref="i46_Index1"/>
  </spine>
  <guide>

    <!-- Added after marking the right image with calibre editor as cover -->
    <reference type="cover" href="EPUB/images/9781284157468_FC.jpg"/>
    <!-- Added after marking the right image with calibre editor as cover -->
  </guide>

</package>

I noticed these epub have this kind of structure : ./EPUB/xhtml/cover.xhtml content of cover.xhtml :

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:ns="http://www.w3.org/2001/10/synthesis" lang="en-us" xml:lang="en-us">
    <head>
        <title>xxxxx</title>
        <link href="../css/theme/night.css" rel="alternate stylesheet" title="night" type="text/css"/>
        <link href="../css/theme/sepia.css" rel="alternate stylesheet" title="sepia" type="text/css"/>
        <meta content="urn:uuid:4df3029a-89df-4306-89d2-a9be503b6aa4" name="Adept.expected.resource"/>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
        <link href="../../stylesheet.css" rel="stylesheet" type="text/css"/>
        <link href="../../page_styles.css" rel="stylesheet" type="text/css"/>
    </head>
    <body aria-label="cover" epub:type="cover" class="calibre">
        <div class="calibre1">
            <img alt="" src="../images/9781284157468_FC.jpg" class="calibre2"/>
        </div>
    </body>
</html>

I don't know if it's a kind of epub standard that is legit to target the cover page and so the cover image or not.

Case 2

I have another case where the first image have been picked as cover (fallback rule I guess) despite its root opf file :

Mich_9780307790361_epub_opf_r1.opf :

<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="PrimaryID" version="2.0">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:title>...</dc:title>
        <dc:language>en-US</dc:language>
        <dc:identifier id="PrimaryID" opf:scheme="ISBN">...</dc:identifier>
        <dc:creator opf:file-as="..." opf:role="aut">...</dc:creator>
        <dc:publisher>...</dc:publisher>
        <dc:date opf:event="publication">2011-03-16</dc:date>
        <dc:rights>...</dc:rights>
        <meta content="cover-image" name="cover"/>
    <meta content="1.2" name="epubcheckversion"/>
    <meta content="2012-02-02" name="epubcheckdate"/>
    <description xmlns="http://purl.org/dc/elements/1.1/">...</description>
</metadata>
    <manifest>
        <item href="Mich_9780307790361_epub_ncx_r1.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
        <item href="OEBPS/Mich_9780307790361_epub_cvi_r1.htm" id="cvi" media-type="application/xhtml+xml"/>
                <!-- ... -->
        <item href="OEBPS/Mich_9780307790361_epub_css_r1.css" id="css" media-type="text/css"/>
        <item href="OEBPS/images/Mich_9780307790361_epub_001_r1.jpg" id="f001" media-type="image/jpeg"/>
        <!-- ... -->
        <item href="OEBPS/images/Mich_9780307790361_epub_103_r1.jpg" id="f103" media-type="image/jpeg"/>
        <item href="OEBPS/images/Mich_9780307790361_epub_cvt_r1.jpg" id="fcvi" media-type="image/jpeg"/>
        <item href="OEBPS/images/Mich_9780307790361_epub_L02_r1.jpg" id="fL02" media-type="image/jpeg"/>
        <item href="OEBPS/images/Mich_9780307790361_epub_L03_r1.jpg" id="fL03" media-type="image/jpeg"/>
        <item href="OEBPS/images/Mich_9780307790361_epub_tp_r1.jpg" id="ftp" media-type="image/jpeg"/>
        <item href="OEBPS/page-template.xpgt" id="page" media-type="application/vnd.adobe-page-template+xml"/>
        <item href="OEBPS/fonts/CharisSILB.ttf" id="font1" media-type="application/x-font-ttf"/>
    </manifest>
    <spine toc="ncx">
        <itemref idref="cvi" linear="yes"/>
        <!-- ... -->
    </spine>
    <guide>
        <reference href="OEBPS/Mich_9780307790361_epub_c01_r1.htm" title="Start" type="start"/>
        <reference href="OEBPS/Mich_9780307790361_epub_cvi_r1.htm" title="cover" type="cover"/>
        <reference href="OEBPS/images/Mich_9780307790361_epub_cvt_r1.jpg" title="thumbimagestandard" type="thumbimagestandard"/>
        <reference href="OEBPS/Mich_9780307790361_epub_cop_r1.htm" title="Copyright" type="copyright-page"/>
        <reference href="OEBPS/Mich_9780307790361_epub_toc_r1.htm" title="Table of Contents" type="toc"/>
    </guide>
</package>

Content of OEBPS/Mich_9780307790361_epub_cvi_r1.htm :

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ops="http://www.idpf.org/2007/ops" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <head>
        <title>xxx</title>
        <link href="Mich_9780307790361_epub_css_r1.css" rel="stylesheet" type="text/css"/>
        <meta content="application/xhtml+xml; charset=utf-8" http-equiv="Content-Type"/>
        <meta content="urn:uuid:6da38ed9-7c6a-45a7-982b-f093c8d71f91" name="Adept.expected.resource"/>
    </head>
    <body style="margin-top: 0px; margin-left: 0px; margin-right: 0px; margin-bottom: 0px; text-align: center;">
        <div class="cover">
            <img alt="" src="images/Mich_9780307790361_epub_cvt_r1.jpg"/>
        </div>
    </body>
</html>

=> cover image seems accessible through opf -> cover html file.

What is may be wrong for DarkThumbs / epub standard, at epub root, the opf file is not content.opf but has a specific name : Mich_9780307790361_epub_opf_r1.opf, and the ncx file too : Mich_9780307790361_epub_ncxr1.ncx Concatenation of {partial author name}{isbn}_epub_opf_r1.opf

META-INF/container.xml is correctly targeting it :

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
      <rootfile full-path="Mich_9780307790361_epub_opf_r1.opf" media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

so container.xml => opf file => html for the cover => image for the cover with the "cover" class looks the path to follow to get the right cover image

fire-eggs commented 3 years ago

Could you provide the original problematic EPUB files? Thanks!

fire-eggs commented 3 years ago

Looks like both of these are using the <guide> tag. The <guide> tag is deprecated by the EPUB 3.0 standard but ideally should still be supported.

I'll have to do some further research and see if I can find some samples. Thanks for the report!

fire-eggs commented 3 years ago

Found one. It appears to most closely match your "case 2".

Wells,H.G.War and the Future[War and the Future]_epub.zip

I'll also note that some readers fail to read it.

I don't have any instances matching your "case 1" ...

s-kocher commented 3 years ago

Regarding the <guide> tag, Calibre Editor is using it when you choose any image, then right click and Choose <Selected image name> as cover image and it fix the cover image selected by DarkThumbs so it looks it's already implemented even if it's old epub format. image

I check if I found other similar cases, the 2 ebooks I found with the case 1 are huge ...

s-kocher commented 3 years ago

may be this one, not exactly the same but look like similar rule with html / xhtml file with name cover Agile Testing.epub.zip

container.xml

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE container PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
  <rootfiles xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfile xmlns="urn:oasis:names:tc:opendocument:xmlns:container" full-path="OEBPS/html/9780321616937.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

opf :

<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="p9780321616937" version="2.0">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
        <dc:title>Agile Testing</dc:title>
        <meta content="cover-image" name="cover"/>
    </metadata>
    <manifest>
        <item href="toc.ncx" id="ncxtoc" media-type="application/x-dtbncx+xml"/>
        <item href="bk01-toc.html" id="htmltoc" media-type="application/xhtml+xml"/>
        <item href="cover.html" id="cover" media-type="application/xhtml+xml"/>
        <item href="graphics/9780321616937.jpg" id="cover-image" media-type="image/jpeg"/>
        <item href="graphics/16937.jpg" id="thumbnail" media-type="image/jpeg"/>
    </manifest>
    <spine toc="ncxtoc">
        <itemref idref="cover"/>
        <itemref idref="frontm"/>
    </spine>
    <guide>
        <reference href="bk01-toc.html" title="Table of Contents" type="toc"/>
    </guide>
</package>

cover.html

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Cover page</title>
        <link href="9780321616937.css" rel="stylesheet" type="text/css"/>
        <meta content="urn:uuid:23ed53c4-16c5-479f-841d-c02ec382c541" name="Adept.expected.resource"/>
    </head>
    <body>
        <div>
            <p class="cover">
                <img alt="cove-image" src="graphics/9780321616937.jpg"/>
            </p>
        </div>
    </body>
</html>
s-kocher commented 3 years ago

My case 1, with many resources removed to make it lighter (so calibre check will probably complain about broken links...) Botany - An Introduction to Plant Biology, Seventh Edition (2021) light.epub.zip

fire-eggs commented 3 years ago

Much obliged!

fire-eggs commented 3 years ago

It turns out both your samples illustrated different problems.

"Agile Testing" should have worked with DarkThumbs 1.0 except for an assumption with how paths inside the archive worked. "Agile Testing" has a slightly different path layout which needed to be handled.

For "Botany", I was able to find the cover image without chasing down the <guide> referenced HTML file. There is an <item> tag in the <manifest> section with an id of "cover-image" which matches the image via the <guide>-HTML.

I intend to still tackle the <guide>-HTML variant because I have one of those in my collection. Which is why I'm leaving this issue open, even though "Agile Testing" and "Botany" should be working with DarkThumbs 1.2.

s-kocher commented 3 years ago

1.2 works well 👍, most ebooks I have are fine now :) Thanks a lot !