daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.
GNU Lesser General Public License v3.0
6 stars 5 forks source link

daisy202-validator: attribute "shape" not allowed here #82

Closed egli closed 7 years ago

egli commented 9 years ago

When validating an export from obi I get the following strange error message

org.xml.sax.SAXParseException; systemId: file:/home/eglic/src/mdr2/samples/dam15988_DAISY202/ncc.html; lineNumber: 31; columnNumber: 44; attribute "shape" not allowed here; expected attribute "id"

See the validation report for the full report.

I don't understand. The file ncc.html doesn't contain the word "shape" at all.

rdeltour commented 9 years ago

this attribute may be added by the HTML loader step (either by the HTML5 parser or by us). Is this with the latest v1.9 modules ?

josteinaj commented 9 years ago

If the NCC is not well-formed XML, it will be parsed as HTML. I think the HTML parser might add this attribute. If the NCC is well-formed, then this shouldn't happen.

Maybe we should add a well-formed check in the validator?

rdeltour commented 9 years ago

@josteinaj yeah, DAISY 2.02 mandates XHTML 1.0 transitional, so we should issue a warning if it's not the case

egli commented 9 years ago

@rdeltour yes this is with the latest v1.9 modules.

The ncc.html looks well-formed to me.

josteinaj commented 9 years ago

There's a DTD reference at the top. When loading the ncc, the default value for the "shape" attribute is set from the DTD:

<!ELEMENT a %a.content;>
<!ATTLIST a
  %attrs;
  charset     %Charset;      #IMPLIED
  type        %ContentType;  #IMPLIED
  name        NMTOKEN        #IMPLIED
  href        %URI;          #IMPLIED
  hreflang    %LanguageCode; #IMPLIED
  rel         %LinkTypes;    #IMPLIED
  rev         %LinkTypes;    #IMPLIED
  accesskey   %Character;    #IMPLIED
  shape       %Shape;        "rect"
  coords      %Coords;       #IMPLIED
  tabindex    %Number;       #IMPLIED
  onfocus     %Script;       #IMPLIED
  onblur      %Script;       #IMPLIED
  target      %FrameTarget;  #IMPLIED
  >

The "shape" value should be valid in DAISY 2.02 though (XHTML 1.0), so I'd say it's a bug in the validator.

xpilgrim commented 9 years ago

I also have this issue. In my ncc, no shape attribute is defined. https://github.com/xpilgrim/daisy-creator-magazin

rdeltour commented 7 years ago

Up. I was reminded of this issue recently by Mayu (ATDO). @bertfrees (or @josteinaj) do you have the time to re-look into it and propose a fix for the next release?

mccallum-sgd commented 7 years ago

Just came across this bug again while validating the daisy202 pipeline sample, same exception:

Validated as DAISY 2.02

org.xml.sax.SAXParseException; systemId: file:/D:/CS/GitHub/pipeline-samples-master/daisy202/dontworrybehappy/ncc.html; lineNumber: 29; columnNumber: 83; attribute "shape" not allowed here; expected attribute "id"
bertfrees commented 7 years ago

@rdeltour Fixed it. See https://github.com/daisy/pipeline-scripts/commit/0d8d3b6e81766a2b43dcc88b5fde74e03d4cb231. I don't know why that line was commented out though. I hope this doesn't break anything.

bertfrees commented 7 years ago

OK I could trace it back to https://github.com/daisy/pipeline1/commit/caeb08ca44f6008bb4a95f652ed5d7d2f1a8f88b. But that doesn't give us much more info.

bertfrees commented 7 years ago

Romain thinks the real issue is that the shape attribute is being automatically added during parsing. We probably have to set a configuration option to disable this behavior in Calabash.

bertfrees commented 7 years ago

See "EXPAND­ATTRIBUTE­DEFAULTS" setting in http://www.saxonica.com/html/documentation/configuration/config-features.html.

bertfrees commented 7 years ago

@rdeltour OK I fixed it. It required a change in calabash-adapter too however so now I'm not so confident anymore it doesn't break anything without doing some more testing. I should call it a day now so I suggest we keep this for a bugfix release otherwise there is too much delay?

rdeltour commented 7 years ago

@bertfrees your call: I you feel like the chance for regression bugs is quite low, it can be worth including the fix (and leave any regression bug to a bugfix release). But if you're not confident enough, I'm OK to postpone the fix.

bertfrees commented 7 years ago

That's fine too. But I really have to go now so could you stage the new calabash-adapter and daisy202-validator?

https://github.com/daisy/pipeline-framework/tree/daisy202-validator-issue-82 needs to be rebased onto https://github.com/daisy/pipeline-framework/tree/release/v1.10.3, and for pipeline-scripts the only module to be staged is daisy202-validator so there was no release branch yet. Latest changes are in https://github.com/daisy/pipeline-scripts/tree/issue-82.

EDIT: no need to release daisy202-validator because no code was changed.

rdeltour commented 7 years ago

@bertfrees ok sure, I'll try to find some time

bertfrees commented 7 years ago

I'm afraid this caused a regression. For some reason Calabash now thinks it needs to use Saxon SA to run validate-with-xsd.