epub3-to-daisy202: text-only daisy 2.02 should also have SMIL files

josteinaj commented 9 years ago

Media Overlay should be generated for all content in the EPUB before converting to DAISY 2.02. That's a useful step for other scripts as well.

bertfrees commented 5 years ago

DAISY 2.02 has some special requirements for the SMILs so we may have to generate the SMILs after the HTML conversion. In either case the code should be written in such a way that it is easily reusable.

The requirements are:

Headings (h1 - h6) and page numbers (span) in the HTML should have their own par in the SMIL. This sets a constraint on the granularity of the SMIL, at least for these specific elements.
Proper nesting of headings. This condition should be fulfilled automatically if the SMIL generation is based on the DAISY 2.02 HTML (not the EPUB 3 HTML).

Generating the SMILs from scratch is relatively straightforward. But "augmenting" existing SMILs is more challenging, because of possible granularity mismatches.

Headings needs to have their own par. So if an existing SMIL references segments within a heading, i.e. when it is too fine-grained, a solution is to merge all the segments in the heading. If the segments do not add up to the complete heading, or if the audio elements can not be combined because they reference different audio files or because the clips don't follow each other, we have to error out.

If an existing SMIL is too coarse-grained for the headings, we can also error out, because that seems unlikely to happen.

For page numbers however it is not so unlikely that the SMIL is too coarse-grained, because page numbers may appear inside paragraph or even inside sentences (or words). A solution could maybe be to skip the page numbers from the NCC in this case.

bertfrees commented 5 years ago

See PR https://github.com/daisy/pipeline-scripts/pull/153.

daisy / pipeline-scripts

epub3-to-daisy202: text-only daisy 2.02 should also have SMIL files #86