Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

Clarify handling of localization #35

Closed dmj closed 8 months ago

dmj commented 2 years ago

ISO Schematron provides localization with the @xml:lang attribute. From the specification it is not clear that or if a conformant processor is required to perform language fixup https://www.w3.org/TR/xinclude/#language when

This is also related to #7 -- i.e. a svrl:failed-assert is supposed to inherit the language property from the sch:assert element

dmj commented 2 years ago

See also https://github.com/schxslt/schxslt/issues/262

rjelliffe commented 2 years ago

It is fair to say that the ISO Schematron Standard "supports" localization, but does not "provide" it. The internationalization that an implementation provides are its implementer's and platform's business.

xml:lang is part of the XML Specification: In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute https://www.w3.org/TR/xml/#dt-attr named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. https://www.w3.org/TR/xml/#sec-lang-tag

There are various alternative methods for Schematron:

1) INCLUDE-TIME: The script sch:includes an external file containing the diagnostics. A custom URI resolver finds the correct one for the language.

2) PRE-INCLUDE-TIME: When processing the pre-include scripts, as a compilation option, the XML engine puts the language-appropriate diagnostics file into the expected file name.

3) POST-INCLUDE-TIME: Diagnostics files for all languages are included, but sch:diagnostics elements not marked with the appropriate language are removed.

4) POST-INCLUDE-TIME-WITH-FALLBACK: sch:diagnostic elements with the same @id are culled so that there is

a) only the diagnostic in the desired language, otherwise

b) the sch:diagnostic[not(@xml:id)] otherwise

c) the diagnostic in a fallback lingua franca, such as @xml:lang[contains(., "en") or

d) some implementation decision, such as generate an error.

5) POST-REPORT-TIME-FILTER: In this approach, the sch:diagnostic/@id is not treated as a unique ID but as a KEY. So the reference from e.g. an sch:assert/@diagnostic="D1" will find all the relevant sch:diagnostic[@id="D1"] and put them into the SVRL. Then the processor of the SVRL must select the appropriate diagnostic for that language.

6) POST-REPORT-TIME-CONSTRUCT: In this approach, the sch:assert/@diagnostics are kept in the SVRL but not evaluated into svrl:diagnostic outputs. Instead, just the sch:diagnostic/(sch:name | sch:value-of) are evaluated, and the results put in some foreign element (with the Xpath as key), and then when post-processing the SVRL, the post-processor selects which sch:diagnostics file to use for the language, and does simple text substitution. (This allows different languages to put generated text in different order, of course.)

At ISO, the Japanese body asked for an example of language-dependent diagnostics, as an annex to the standard. It is best if any xslt implementation supports at least what is needed to make the example in the annex work. I think the minimum to support is the POST-INCLUSION suppression of any sch:diagnostics[$LANGUAGE (:is specified :)][@xml:lang][not(contains(@xml:lang, $LANGUAGE))] and sch:diagnostic[$LANGUAGE (:is specified :)][@xml:lang][not(contains(@xml:lang, $LANGUAGE))]

Personally, I think this is something that is better part of an industry best-practice profile, not part of the standard itself, for the sake of minimalism and effectiveness.

Regards Rick

dmj commented 2 years ago

Personally, I think this is something that is better part of an industry best-practice profile, not part of the standard itself, for the sake of minimalism and effectiveness.

Agreed.

I think all that's needed is a short normative note that a Schematron processor must perform language fixup when incorpoting external definitions (sch:include, sch:extends) and instantiating abstract patterns, abstract rules, diagnostics, and properties.

The standard should indeed not discuss how to achieve this or how to implement localization.

AndrewSales commented 8 months ago

Added a new clause under "6 Semantics".