fread-ink / epub-cfi-resolver

A simple parser, resolver and generator for the EPUB-CFI format
GNU Affero General Public License v3.0
19 stars 3 forks source link

CFI Generator leading path question #10

Closed halmos closed 3 years ago

halmos commented 3 years ago

Hi,

Thank you for writing this much-needed library. I have a question regarding the generate method.

Using a standard epub opf file as a testing fixture:

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" xml:lang="en" unique-identifier="pub-id">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
... 
  </metadata>
  <manifest>
    ...
    <!-- Chapters -->
    <item id="titlepage" href="titlepage.xhtml" media-type="application/xhtml+xml"/>
    <item id="chapter_001" href="chapter_001.xhtml" media-type="application/xhtml+xml"/>
...
  </manifest>
  <spine>
    <!-- Frontmater -->
    <itemref idref="cover" linear="no"/>
    <itemref idref="toc" linear="no"/>
    <!-- Chapters -->
    <itemref linear="yes" idref="titlepage"/>
    <itemref id="test" linear="yes" idref="chapter_001"/>
...
  </spine>
</package>

And generating a cfi with:

epubCfi.generate([
              { node: spineNode, offset: undefined },
              { node: range.startContainer, offset: range.startOffset },
])

the epub-cfi-resolve generator produces a cfi in the form of

epubcfi(/2/6/4!/2/4/2/4/2,/2/2/1,/22/2/1:16)

Based on the cfi spec (http://idpf.org/epub/linking/cfi/epub-cfi-20111011.html#sec-path-examples), I don't expect to find that leading /2 in the path. The examples in the cfi docs start with /6/such that the first path element points to the position of the spine in the package document.

However, I noticed in the unit tests for epub-cfi-resolve also have the leading /2 in the path and this section of idpf cfi docs are not clearly defined so I'm not certain what the correct form should be.

It's also worth noting that epubjs's cfi parser objects to my cfi's with the leading /2, but functions as expected when that portion of the cfi is removed.

Could you clarify what the part of the path is resolving from?

halmos commented 3 years ago

Looking into this in more depth, I see that the leading /2 that epub-cfi-resolver is adding to the path, refers the the <package> node itself within the package.opf file.

The idpf docs could be more clear on this point but I believe that the path is expected to use the <package> node as the root:

For a Standard EPUB CFI, the leading step in the CFI must start with a slash (/) followed by an even number that references the spine child element of the Package Document's root package element.

In other words, the first step should be the index number of the spine element relative to the package element and not the index number of the package element relative to the document root.

Based on that, I believe that the epub-cfi-resolver generate method should not be adding the first /2 for the <package> element.

Juul commented 3 years ago

Thank you for the thorough explanation and pull request! I believe you are correct. It also looks like this affects parsing so I'll try to fix that soon. I think the pull request is probably good as is but I want to verify some corner cases first. Also, are you aware that your pull requests are not linked to your github account? Feel free change it or let me know if it doesn't matter.

Juul commented 3 years ago

Err what I mean is that the commits in your pull request are not linked to your github account. After having another look at the OPF specification I'm ready to accept your pull request.

halmos commented 3 years ago

Great! I fixed the account email linking - thanks for letting me.

Juul commented 3 years ago

It doesn't look like the parsing/resolving code is affected by this bug so I'm merging the pull request and closing this issue.