Sigil-Ebook / Sigil

Sigil is a multi-platform EPUB ebook editor
GNU General Public License v3.0
5.96k stars 578 forks source link

[Bug]: Generate NCX/Guide for epub2 not creating guide correctly when Landmarks are using anchors #776

Closed microcrash closed 1 month ago

microcrash commented 1 month ago

Bug Description

If the landmarks section includes id anchors on the same page, using Generate NCX/Guide for epub2 e-readers is only listing the last item from that page in the guide without using the id anchor in the path. Example landmark:

  <nav epub:type="landmarks" id="landmarks" hidden="">
    <h1>Landmarks</h1>
    <ol>
      <li>
        <a epub:type="cover" href="cover.xhtml">Cover</a>
      </li>
      <li>
        <a epub:type="titlepage" href="frontmatter.xhtml#title-page">Title Page</a>
      </li>
      <li>
        <a epub:type="copyright-page" href="frontmatter.xhtml#copyright">Copyright</a>
      </li>
      <li>
        <a epub:type="dedication" href="frontmatter.xhtml#dedication">Dedication</a>
      </li>
      <li>
        <a epub:type="acknowledgements" href="frontmatter.xhtml#acknowledgements">Acknowledgements</a>
      </li>
      <li>
        <a epub:type="toc" href="toc.xhtml">Table of Contents</a>
      </li>
      <li>
        <a epub:type="bodymatter" href="chapter1.xhtml">Start of Content</a>
      </li>
      <li>
        <a epub:type="bibliography" href="bibliography.xhtml">Bibliography</a>
      </li>
    </ol>
  </nav>

Produces the following guide on content.opf

  <guide>
    <reference type="cover" title="Cover" href="Text/cover.xhtml"/>
    <reference type="acknowledgments" title="acknowledgments" href="Text/frontmatter.xhtml"/>
    <reference type="toc" title="Table of Contents" href="Text/toc.xhtml"/>
    <reference type="text" title="Text" href="Text/chapter1.xhtml"/>
    <reference type="bibliography" title="Bibliography" href="Text/bibliography.xhtml"/>
  </guide>

It would have been expected to produce this instead:

  <guide>
    <reference type="cover" title="Cover" href="Text/cover.xhtml"/>
    <reference type="titlepage" title="Title Page" href="Text/frontmatter.xhtml#title-page"/>
    <reference type="copyright-page" title="Copyright Page" href="Text/frontmatter.xhtml#copyright"/>
    <reference type="dedication" title="Dedication" href="Text/frontmatter.xhtml#dedication"/>
    <reference type="acknowledgments" title="Acknowledgments" href="Text/frontmatter.xhtml#acknowledgements"/>
    <reference type="toc" title="Table of Contents" href="Text/toc.xhtml"/>
    <reference type="text" title="Text" href="Text/chapter1.xhtml"/>
    <reference type="bibliography" title="Bibliography" href="Text/bibliography.xhtml"/>
  </guide>

Platform (OS)

Windows (Default)

OS Version / Specifics

Windows 11

What version of Sigil are you using?

2.3.1

Any backtraces or crash reports

No response

dougmassay commented 1 month ago

That definitely sounds/looks like a bug. But I'm betting it's been around for quite a few releases. Let me make sure I can duplicate what you're experiencing.

dougmassay commented 1 month ago

Yes. It looks to me that the code in the following For Loop is only allowing one guide item per html resource:

https://github.com/Sigil-Ebook/Sigil/blob/master/src/MainUI/MainWindow.cpp#L2556

If I'm reading things correctly, it's processing all Landmarks in the NAV, but only one Landmark per html resource is ever going to get added to the OPF Guide.

I would think all GetLandmarkCodeForPaths() would need to be looped through rather than looping though all html resources and checking to see if there is a matching entry in GetLandmarkCodeForPaths().

I could be wrong though.

kevinhendricks commented 1 month ago

Yes, a bug. Older versions of Sigil allowed only one semantic type set per file. The code for Nav landmarks allows hrefs with fragments but did not properly return the fragments since it was not needed under that old assumption. So the nav code for only the final fragment written to the hashtable survived.

We will have to somehow generate a list of all landmark code types and their associated fragments (if any) for each resource.

We will have to modify NavProcessor to return that info and then modify the necessary downstream code.

kevinhendricks commented 1 month ago

Most often epub developers split frontmatter into its separate file pieces (using Sigil's split at cursor or marker capability) by function because each piece has its own role and typically starts on its own page. Then the one semantic code per "file" design works just fine.

But clearly the epub2 and epub3 specs do allow for more than one semantic code per file in both the opf guide and nav even if that typically does not make sense in many cases.

From examining this further, we really need to handle the general case of more than one semantic code per file. This will actually take a major rework and redesign of the code that adds and removes semantic codes so that an id (fragment) can be easily specified at the time of creation and deletion. This has been on my todo list for some time. This looks like a good time to remove this limitation.

kevinhendricks commented 1 month ago

A tentative fix for this has now been pushed to master. This fix is quite invasive as it changes the internal paradigm from one semantic (landmark/guide entry) per file to allowing multiple semantic entries based on file ids/fragments.

It will require lots of testing before any official release. It (or something equivalent) will be included in the next release of Sigil.

In the meantime, as a workaround, use Sigil's split at cursor or marker functionality to split frontmatter.xhtml into its constituent files and Sigil should handle setting semantics on each one just fine, and then creating the opf guide from the nav should also work.

Thank you for your bug report!