Bogus error about non-existent processing instruction matching [xX][mM][lL]

cmsmcq commented 6 years ago

Roma is objecting to an ODD file I have been submitting to it, with the error message

The processing instruction target matching "[xX][mM][lL]" is not allowed.

The only problem is that the ODD file in question has no content that violates the relevant rule of XML well-formedness: it has an XML declaration, some xml-stylesheet processing instructions, and some xml-model processing instructions, but no others.

At first I thought perhaps Roma was the victim of some obsolete infrastructure that didn't know about xml-model PIs, but removing all the processing instructions and the XML declaration failed to clear the error.

The ODD file I am trying to process is on the Web at http://uyghur.ittc.ku.edu/2018/05/atmo-schemas-PTS.xml

lb42 commented 6 years ago

I agree that this is a daft error message. It usually means that one of the temporary files web roma generates is defective because of unrelated problems in your source, so the first step is to see whether your odd generates a schema outside roma (e.g. with the oXygen framework, or the command line teitorelaxng tool), and fix any problems arising there.

cmsmcq commented 6 years ago

i haven't used the Oxygen transformation scenario for a couple of weeks (like Roma, it has no visible way to specify which of several schemaSpec elements is to be processed), but the last time I tried it had no problem with the ODD document. The package TEIC/Stylesheets likewise has no difficulty with producing the three schemas from the ODD document.

lb42 commented 6 years ago

I think you can set the "schema" (or "selectedSchema"?) parameter in the transformation scenario to get the required behaviour from oxygen. It's reasuurring to know that it works at the command line anyway.

I will leave it to someone else to comment on the web Roma problem: when I was on council, the general policy was that Roma was going to be imminently replaced by something else, so only life threatening bugs would be fixed. but I'm not sure how far that's got ... maybe @raffazizzi can comment!

peterstadler commented 5 years ago

Dear @cmsmcq, the issue is probably outdated(?) but I gave it a try just now.

When I downloaded your ODD file and uploaded it to https://roma2.tei-c.org, it complained about duplicate ID values for 'tiers-grammar-segmented'. After removing it, Roma successfully consumed the ODD file and was able to output HTML documentation (that's all I tested).

So, is it fair to close the issue?

cmsmcq commented 5 years ago

Thank you, @peterstadler -- it appears that something changed in my ODD file or in Roma between the filing of the issue and now. Tests with the ODD file current in May 2018 and with the current ODD suggest that the salient change was with Roma: I'm no longer getting the error message described in the original description of the issue.

On the other hand, what I'm getting (and what I assume you got when you tested with the current Odd document) is not a Relax NG schema.

The output begins with two XML declarations (which may explain the original error message) and continues with an XML element named {}error, with a msg attribute reading "A sequence of more than one item is not allowed as the first argument of fn:concat() (@prefix, @prefix, @prefix, ...)" and content reading

pl.psnc.dl.ege.exception.ConverterException: A sequence of more than one item is not allowed as the first argument of fn:concat() (@prefix, @prefix, @prefix, ...) 
    at pl.psnc.dl.ege.tei.TEIConverter.convert(TEIConverter.java:174)
    at pl.psnc.dl.ege.component.NamedConverter.convert(NamedConverter.java:44)
    at pl.psnc.dl.ege.ConversionPerformer.run(ConversionPerformer.java:45)
    at java.lang.Thread.run(Thread.java:748)

This jogs a vague memory that at some point in my work on the Odd document I learned that my interpretation of what the Guidelines said about prefixes was not the interpretation embedded in Roma, though I cannot now recall whether it was a question of namespace prefixes (e.g. on Schematron rules) or the prefixes (like "tei_" supplied by the Odd processors to munge names), and running diff on different versions of the Odd file has not helped me identify what was changed.

It's possible of course that I ran out of patience and just changed my local copy of the Odd processor.

[Pause.]

More than possible, likely. More that likely: I did change the stylesheet odd2odd.xsl, which tries to generate a schematron namespace prefix using the expression

concat(ancestor::tei:schemaSpec//sch:ns[@uri=$myns]/@prefix,':')

which fails if any tei:schemaSpec ancestor chances to have more than one descendant named sch:ns with the appropriate namespace name as the value of the uri attribute. Since pretty much all of my constraint elements bind all relevant namespaces locally for legibility, the stylesheet's assumption that there will only ever be one sch.ns element at the end of the sch:ns step is badly out of step with reality.

Since Roma is still not processing the ODD correctly, I think the issue should probably not be closed, though fixing it probably will involve first fixing odd2odd.xsl and then putting the corrected version into service in the Roma system.

peterstadler commented 5 years ago

Thank you very much for the background! Now I see clearer :)

Indeed, it seems like a Stylesheets issue so I created https://github.com/TEIC/Stylesheets/issues/366. I'll leave it open here until the Stylesheets issue is resolved and deployed to Roma.

TEIC / Roma-Antiqua

Bogus error about non-existent processing instruction matching [xX][mM][lL] #26