daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.
GNU Lesser General Public License v3.0
6 stars 5 forks source link

epub3-to-daisy202: resulting ncc.html is invalid #116

Open josteinaj opened 6 years ago

josteinaj commented 6 years ago

Steps to reproduce:

The ncc is based on the EPUB 3 navigation document. However, it is not properly converted. Most of the metadata is missing, and the content is mostly the same as in the EPUB 3 navigation document.

input ncc:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="no" lang="no">
<head>
    <title>Title</title>
    <meta name="dc:title" content="(Title)" />
    <meta name="dc:creator" content="(Author)" />
    <meta name="dc:identifier" content="625904" />
    <meta name="dc:format" content="Daisy 2.02" />
    <meta name="dc:publisher" content="NLB" />
    <meta name="dc:date" content="2017-10-02" scheme="yyyy-mm-dd" />
    <meta name="dc:language" content="no" />
    <meta name="ncc:charset" content="utf-8" />
    <meta name="ncc:narrator" content="(Narrator)" />
    <meta name="ncc:producer" content="NLB" />
    <meta name="ncc:sourcePublisher" content="Gyldendal" />
    <meta name="ncc:pageNormal" content="80" />
    <meta name="ncc:pageSpecial" content="0" />
    <meta name="ncc:pageFront" content="0" />
    <meta name="ncc:maxPageNormal" content="81" />
    <meta name="ncc:tocItems" content="95" />
    <meta name="ncc:depth" content="1" />
    <meta name="ncc:totalTime" content="00:39:48" />
    <meta name="ncc:setInfo" content="1 of 1" />
    <meta name="ncc:multimediaType" content="audioFullText" />
    <meta name="ncc:files" content="77" />
    <meta name="ncc:generator" content="Hindenburg ABC Studio 1.26.2164" />
    <meta http-equiv="Content-type" content="application/xhtml+xml; charset=utf-8" />
</head>
<body>
    <h1 id="d1574e37-0" class="title"><a href="s001.smil#d1574e37-0">Title</a></h1>
    <h1 id="d1574e46-0"><a href="s002.smil#d1574e46-0">Lydbokavtalen</a></h1>
    <h1 id="hix00392"><a href="s003.smil#hix00392">Bokinformasjon</a></h1>
    <h1 id="hix00393"><a href="s004.smil#hix00393">Informasjon om DAISY-boka</a></h1>
    <span id="page-2" class="page-normal"><a href="s004.smil#page-2">2</a></span>
    <span id="page-3" class="page-normal"><a href="s004.smil#page-3">3</a></span>
    <span id="page-4" class="page-normal"><a href="s004.smil#page-4">4</a></span>
    <span id="page-5" class="page-normal"><a href="s004.smil#page-5">5</a></span>
    <span id="page-6" class="page-normal"><a href="s004.smil#page-6">6</a></span>
    <h1 id="d1574e122-0"><a href="s005.smil#d1574e122-0">Baksidetekst</a></h1>
    <span id="page-7" class="page-normal"><a href="s005.smil#page-7">7</a></span>
    <h1 id="h1_2"><a href="s006.smil#h1_2">Innhold</a></h1>
    <span id="page-8" class="page-normal"><a href="s006.smil#page-8">8</a></span>
    <span id="page-9" class="page-normal"><a href="s006.smil#page-9">9</a></span>
    <h1 id="h1_3"><a href="s007.smil#h1_3">Den store sydenturen</a></h1>
    ...
</body>
</html>

output ncc:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="no">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta name="dc:identifier" content="625904" />
    <title>(Title)</title>
</head>
<body>
    <div id="toc" class="nav">
        <ol xmlns:epub="http://www.idpf.org/2007/ops">
            <li><a id="d1574e37-0" href="625904.html#d1574e37-0">(Title)</a></li>
            <li><a id="d1574e46-0" href="625904.html#d1574e46-0">Lydbokavtalen</a></li>
            <li><a id="hix00392" href="625904.html#hix00392">Bokinformasjon</a></li>
            <li><a id="hix00393" href="625904.html#hix00393">Informasjon om DAISY-boka</a></li>
            <li><a id="d1574e122-0" href="625904.html#d1574e122-0">Baksidetekst</a></li>
            <li><a id="h1_2" href="625904.html#h1_2">Innhold</a></li>
            <li><a id="h1_3" href="625904.html#h1_3">Den store sydenturen</a></li>
            ...
        </ol>
    </div>
    <div id="page-list" class="nav">
        <ol xmlns:epub="http://www.idpf.org/2007/ops">
            <li><a id="page-2" href="625904.html#page-2">2</a></li>
            <li><a id="page-3" href="625904.html#page-3">3</a></li>
            <li><a id="page-4" href="625904.html#page-4">4</a></li>
            <li><a id="page-5" href="625904.html#page-5">5</a></li>
            <li><a id="page-6" href="625904.html#page-6">6</a></li>
            <li><a id="page-7" href="625904.html#page-7">7</a></li>
            <li><a id="page-8" href="625904.html#page-8">8</a></li>
            <li><a id="page-9" href="625904.html#page-9">9</a></li>
        </ol>
    </div>
</body>
</html>
josteinaj commented 6 years ago

When I try manually fixing the output fileset by replacing the NCC with the original NCC, updating the filename for the SMIL backlinks, I get this cryptic message:

java.lang.RuntimeException: com.xmlcalabash.core.XProcException: Reading result on main
    at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:245) ~[na:na]
    at org.daisy.pipeline.job.Job.run(Job.java:216) ~[na:na]
    at org.daisy.pipeline.job.impl.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:113) ~[na:na]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
Caused by: com.xmlcalabash.core.XProcException: Reading result on main
    at com.xmlcalabash.core.XProcException.dynamicError(XProcException.java:160) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:247) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(XPipelineCall.java:94) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:236) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(XPipelineCall.java:94) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:236) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136) ~[na:na]
    at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:242) ~[na:na]
    ... 3 common frames omitted

It sounds like there's an empty sequence of documents appearing in a XProc script where a sequence is not allowed, probably because of a bad href at some point, which should rather be displayed as a validation error and not a stacktrace.

bertfrees commented 5 years ago

@josteinaj I'm confused. You opened this issue in 2017, but the behavior you describe doesn't match the code at that time. Already in 2014 you did improvements to the NCC. Since that commit the NCC is not generated from the nav document anymore. Incidentally, in that commit you changed nav-to-ncc.xsl, but the file is not even used anymore.

josteinaj commented 5 years ago

That's strange. I guess I'll have to do some more testing to see what the current behavior is.