Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

Behavior of pattern/@documents #21

Open xatapult opened 3 years ago

xatapult commented 3 years ago

The ISO skeleton implementation resolves a pattern/@documents value as relative the source XML document's location. SchXslt resolves this as relative to the schema location.

The standard leaves this undefined. However, I think the ISO interpretation makes more sense: When a source document contains a relative reference to some other document, this path is usually relative to the source document. Resolving it relative to the schema document (which could be anywhere) does IMHO not make much sense.

It should be specified unambiguously.

rjelliffe commented 3 years ago

Yes, it was definitely intended to be relative to the document, as the motivating use case was to validate (unzipped) ZIP archives or any document format with an initial TOC file or reliable file name conventions, such as epbub, OOXML.

Let's distinguish three kinds of links possible in Schematron, with (new suggested) names:

1) Schema links. You can use include and extends to resolve links. Relative links are relative to the current schema document: compile time. (Multiple levels of linking possible??)

2) Validation links. You use pattern/@documents Relative links are relative to the current document: run time. (Only one level of linking is possible.)

3) Data links. You pull in some XML document with information you want to access, e.g into a variable. But you cannnot traverse the document to validate it directly. (The rule/@contexts do not operate on them and so can never be fired by them.)

3a)  Compile time. Use* include* inside a variable. Links are relative

to the current schema document. (Multiple levels of linking possible?? )

3b) Run time document-relative. Use *document(*) in an XPath.  Links

are relative to the current document. (You can use information from accessing one document using document() to construct another document() link; multiple, fixed levels of links are possible. There might be some way to use for each in XPath to traverse a chain of links. If you defined a function, you could do transitive closures, I suppose, with an unbounded number of levels of links.)

3c) Run-time absolute. Use *document(*) in an XPath.

In other words, there is no resolution at runtime relative to the schema. I understand this is something that people may want to do: e.g. to be able to put in a table of values next to the schema, and have the schema dynamically pull those in. However, to me the appropriate way to do this is to use a parameter to the schema, and construct an absolute path using 3c).

It would be a good idea to clarify the standard on these.

Regards Rick

On Mon, Sep 6, 2021 at 6:21 PM Erik Siegel @.***> wrote:

The ISO skeleton implementation resolves a @.*** https://github.com/documents value as relative the source XML document's location. SchXslt resolves this as relative to the schema location.

The standard leaves this undefined. However, I think the ISO interpretation makes more sense: When a source document contains a relative reference to some other document, this path is usually relative to the source document. Resolving it relative to the schema document (which could be anywhere) does IMHO not make much sense.

It should be specified unambiguously.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKKSPG6K2JEK74ENO3TUAR2ZPANCNFSM5DP5LOQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dmj commented 2 years ago

Agreed. Section 5.4.10 should be amended as follows:

The optional documents attribute provides IRIs of the subordinate documents the rule contexts are relative to. If the expression evaluates to more than one IRI, then the pattern is sought in each of the documents. Relative IRIs are resolved to the base IRI of the original instance document. The documents attribute is evaluated in the context of the original instance document root.

rjelliffe commented 2 years ago

One important thing is that if multiple patterns are to validate the same @document, we dont want to have to download the same document each time. First, for efficiency, and second because the document may have changed between retrievals.

So I have added text to the General Clarrification page. Probably it can be clarified.

(Also, I have corrected the stripped out Xpaths in this issue.)

AndrewSales commented 1 year ago

we dont want to have to download the same document each time. First, for efficiency, and second because the document may have changed between retrievals

Indeed, in the case of XPath-derived language implementations, this would align with the behaviour of fn:doc(), which is deterministic by default.

AndrewSales commented 9 months ago

Relative IRIs are resolved to the base IRI of the original instance document.

Amended in latest draft - thanks.

AndrewSales commented 9 months ago

we dont want to have to download the same document each time. First, for efficiency, and second because the document may have changed between retrievals.

I've also added a recommendation about this -- I don't think we should mandate it, in case there are query languages which can't/don't support it, or it may be that a use case requires non-deterministic operation.