daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.
GNU Lesser General Public License v3.0
6 stars 5 forks source link

The "assert validity" option on dtbook-to-epub3 does not seem to work #67

Closed josteinaj closed 7 years ago

josteinaj commented 9 years ago

This might affect dtbook-to-zedai, zedai-to-epub3 or underlying utility steps as well.

josteinaj commented 9 years ago

And maybe the option should be false by default (although default values are not exposed through the web ui yet) to avoid confusing error messages about the intermediary format (zedai in this case).

bertfrees commented 7 years ago

I've added a test and it seems this has been fixed in the meantime.

although default values are not exposed through the web ui yet

@josteinaj This has now changed. Do you still think the default should be "false"?

josteinaj commented 7 years ago

I think the input files to scripts should be validated by default. For intermediary formats, I'm not sure.

bertfrees commented 7 years ago

What about outputs? What do you do in your dtbook-to-epub3 with outputs and intermediary formats?

bertfrees commented 7 years ago

When you say "validated" you mean "fail when not valid" right?

josteinaj commented 7 years ago

In the nordic scripts I validate inputs, outputs and intermediary formats. I don't know if it's a good or bad thing though. Unless the user knows that there exists an intermediary format, the intermediary format validation errors can be confusing.

validated vs fail when not valid => I was thinking either validate and fail, or not validate at all, but yeah, giving validation errors without aborting script execution would probably be better than not validating at all.

bertfrees commented 7 years ago

Also if the input is valid I think you shouldn't fail when outputs (or intermediary files) are invalid, because then a user doesn't get anything and might not know what to do.

bertfrees commented 7 years ago

Fixed by https://github.com/daisy/pipeline-scripts/pull/104

mccallum-sgd commented 7 years ago

Just tested this.

I used the engineering DTBook sample from pipeline#376, which is invalid XML (see pipeline#376 comment).

I then ran the DTBook to EPUB 3 script two times on this invalid sample with assert-validity=true and assert-validity=false...

assert-validity=true: job done, no errors but validation fails as expected:

"File is not well-formed XML."

asser-validity=false: job error as expected, messages:

WARNING: org.xml.sax.SAXParseException; … /CK_12_Engineering__An_Introd.xml; lineNumber: 78; columnNumber: 117; character content of element “pagenum” invalid; must be an integer
ERROR: ERR:XC0053:error code?
ERROR: ERR:XC0053:error code?
ERROR: ERR:XC0053:error code?
ERROR: ERR:XC0053:error code?
ERROR: null

assert-validity option for DTBook to EPUB 3 is working correctly.

Quick suggestion: I feel the terminology "assert validity" could be confusing from an end-user standpoint. The option description and documentation is clear, but the option's name itself could be taken to mean that the user is asserting the validity of the document, not the script. I know it was modeled after existing XProc steps like this one, but something along the lines of "Abort on Validation" or "Strict Validation" might be more clear and user-friendly.