daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.
GNU Lesser General Public License v3.0
6 stars 5 forks source link

epub3-to-daisy202 produces invalid daisy202 from valid epub3 #150

Open bertfrees opened 5 years ago

bertfrees commented 5 years ago

See https://github.com/daisy/pipeline/issues/529#issuecomment-426289149.

bertfrees commented 5 years ago

It turns out that the NCC file generated by epub3-to-daisy202 has several errors:

bertfrees commented 5 years ago

Various other validation issues (Jostein converted the C00000.epub sample book and validated the result with Pipeline 1)

I just tried the Pipeline 2 validator. It also discovers some of the issues, but not all of them:

  • File not found: C00000-2-toc.html (in file:/C:/Users/jostein/Desktop/C00000/C00000-01-cover.html)
  • bad value for attribute 'name' (in ncc)
  • bad value for attribute 'content' (in ncc)

@rdeltour says there is a Java part in Pipeline 1 that was not ported to Pipeline 2, but it seems there are several other differences.

bertfrees commented 5 years ago

That the schematron errors aren't visible in the Pipeline 2 report is because they are embedded in the RelaxNG files, and Pipeline 2 doesn't support this.

bertfrees commented 5 years ago
  • <meta name="viewport" content="width=device-width"/>: value of attribute "name" is invalid
  • <meta name="dcterms:modified" content="2018-10-22T15:57:05+00:00" />: value of attribute "name" is invalid

@josteinaj These two are added in epub3-to-daisy202 (opf-to-html-metadata.xsl). See https://github.com/daisy/pipeline-scripts/commit/d51dcad4b98ed69326663c582e9d9ebf45156392. But it is invalid. Not according to Pipeline 1 though. What should I do with this?

bertfrees commented 5 years ago
  • Attribute 'xmlns:...' must be declared for element type 'smil'
  • Attribute 'xmlns:epub' must be declared for element type 'html'

Are these really validation issues, or are these shortcomings of Pipeline 1? Of course we can make sure that there are no unneeded namespace declarations in the files (EDIT: I did this now), but still... Should they cause errors? I can't reproduce this with the Pipeline 2 validator.

bertfrees commented 5 years ago
  • File type not allowed in DAISY 2.02 fileset: ... (expected a html, smil, mp2, mp3, wav, jpg, gif, png or css file type)

Where can I find more info about the allowed file types? http://www.daisy.org/publications/specifications/daisy_202.html talks about the allowed audio file types, but it doesn't mention any image file types.

bertfrees commented 5 years ago

That the SMIL related issues are not visible in the Pipeline 2 report is because these validation results are simply ignored. See 2b04ed2. @josteinaj Do you remember if was this on purpose?

bertfrees commented 5 years ago
  • Could not compare calculated duration to stated duration since this information is missing in the NCC

In Pipeline 1, time checks are implemented in Java (ValidatorImplD202). In Pipeline 2 this is done in XSLT/XProc.

  • Invalid pseudo-function

This is also implemented in Java in Pipeline 1 (CssFileImpl).

bertfrees commented 5 years ago

See PR: https://github.com/daisy/pipeline-modules/pull/1