AGIuk / Schematron

The Schematron files to support GEMINI 2.3 validation
1 stars 3 forks source link

Modified supplemental schema to produce warnings #2

Closed petermcdaid closed 4 years ago

petermcdaid commented 4 years ago

The Supplimental Schematron schema is for recommendations only so modified any asserts to now be reports. This involved reversing the logic. i.e. =1 would become !=1

PeterParslow commented 4 years ago

A question @petermcdaid, from one of the GEMINI Working Group.

"Asserts evaluate negative and reports positives. Asserts are for errors and Reports are to say what's passed. So the distinction really is whats being empathized - 'passes' or 'fails'. So I think Asserts is still correct if the focus is on error reporting. It's a question of semantic as to whether a 'warning' is a 'success' or not."

So why does your team feel that, out of all the successes that could be reported, it's appropriate to just report (as successes) the warnings?

petermcdaid commented 4 years ago

As they are recommendations rather than requirements it didnt seem appropriate for it to show an error. This is I think consistent with other parts of the schema for reporting where it has deviated from the recommendation.

It is not required for the Spatial reference to be in the default list, so it shouldn't fail the validation if it is not, but the apparent purpose of the check is to inform the end user that it is not in there.

PeterParslow commented 4 years ago

Response from one of the other organisations that have used the Schematron in their implementation:

I'm of the opinion that it should be left as it is currently. It's really up to the metadata catalog administrator whether the supplemental schematron failures should be taken as full failures or just warnings.

nmtoken commented 4 years ago

It's possibly worth noting that I used asserts and reports in both the required schematron and the supplemental. I wasn't thinking particularity semantically here so I just prefer test="gmd:description" in an assert over test="count(gmd:description) = 0" in a report for example.

Looking at Correct and Robust: Schematron’s assert versus report just now, I think I probably should have used more reports, but the post does go on to state:

Of course, none of this is set in concrete: you choose pragmatically

The schematrons are split deliberately into must pass and recommendations, irrespective of the use of reports over asserts; I can't see a need to refactor, it's just one style over another.

PeterParslow commented 4 years ago

The author of the 'version 1' Schematron files, now engaged on the MEDIN implementation, remarks:

I’ve referred to ISO 19757-3 Schematron, which is one of the freely available standards. My assumption is that this is the definitive specification that is being followed in our and Gemini’s context. And I’ve looked at the Schematron schemas in the repo. But I’ve not read more widely than this.

Looking at the standard definition of assert and report, I find that they are both considered assertions. The key difference between them is:

Assert: The natural-language assertion shall be a positive statement of a constraint

Report: the natural-language assertion shall be a positive statement of a found pattern or a negative statement of a constraint

Natural-language assertion: natural-language statement expressing some part of a pattern

So they are sides of the same coin, I think. With ‘report’ having the additional ‘positive statement of a found pattern’ nature. I think my assessment is that, with reference to the Schematron standard only, it does not matter whether the Gemini supplemental schema uses ‘assert’ or ‘report’ elements. Since it was written originally using ‘assert’ I would not change it. Change might have an impact on implementers (though I don’t know the stage of implementation).

I’m intrigued by the title of the pull request. It is, ‘Modified supplemental schema to produce warnings’. This implies that there is an interpretation to be applied to the result of a validation and that a ‘report’ firing is maybe to be interpreted as a ‘warning’. I don’t know if this is specific to the supplemental schema or if it applies generally to the Gemini schemas. As far as I can tell, there is no statement in the standard (ISO 19757-3) which says how an ‘assert’ or ‘report’ firing in a validation should be interpreted.

I also don’t think that the ‘report’ elements used in https://github.com/AGIGemini/Schematron/blob/master/GEMINI_2.3_Schematron_Schema-v1.0.sch Amount to warnings. I think if a ‘report’ fires it implies validation fail. I think…

The standard does define an element named ‘diagnostic’ which is defined as: a natural-language message giving more specific details concerning a failed assertion, such as found versus expected values and repair hints.

Then Note 2 following says: typical values for the role attribute on a diagnostic element might be warning, caution or note.

So, it’s just possible that this element could be used to make it explicit how a ‘report’ or ‘assert’ firing in a validation should be interpreted (though the role attribute qualifies the diagnostic not the assert or the report…). With the usual caveat that this change would have a big impact on implementers. In semantic versioning, this would cause a major release increment. If indeed I’m right that this is how the diagnostic element is intended to be used – which I’m not 100% on.

I’ve looked at the Medin Schematron v3.0 and I see this in the header:

 2010-02-01 - Version 1.6
 All sch:report elements removed from the schema. These elements cause svrl:successful-report
 elements to be output in SVRL. oXygen 10.x interprets these elements as warnings while 11.1
 interprets them as errors. The intention in the context of this schema was that svrl:successful-report
 would be interpretted as information.

Which is interesting. It indicates how I originally considered a ‘report’ element would be used (my interpretation…). And shows how that rubbed up against an implementation. And that I can’t spell.

I understand that the supplemental schema is there to provide guidance and useful information to users. There’s merit in that. It points out things which could be changed but are not considered enough to render the metadata instance invalid. I don’t think it matters whether the ‘assert’ or ‘report’ element is used, from the point-of-view of the standard (ISO 19757-3). But the schema would need to have some overriding documentation which states the purpose and how the validation result should be interpreted. Bearing in mind that the standard (ISO 19757-3) says that, “A general Schematron validator is a function returning ‘valid’, ‘invalid’ or ‘error’”. What we’re envisioning here is a specific Schematron validator which additionally returns ‘information’. I think.

Most important for Medin I think is to consider implementation. Has anyone implemented this yet? Will this pull request have an impact on them? And is there sufficient information to allow the Gemini schemas, or the results of validations, to be interpreted correctly.

I like the fact that the Schematron is managed in github. It would be nice if the Medin Schematron were managed in a similar way. Not necessarily github, though that would be nice solution."

PeterParslow commented 4 years ago

Taken together, that seems to imply that the Schematron standard does not draw a clear difference between assert & report - at least, not suggesting that some are more serious than others. "Passing" an assert is equivalent to not firing a "report"; and conversely, firing a report is equivalent to failing an assert.

Schematron.com describes them as "two varieties of assertion", distinguished like this "Assert is used when you are stating what should be found as part of the pattern. The assertion text might be of the form “ An X should have Y because ABC”. Is the document, or set of documents, or input/output what you expect. Report is used when you find something interesting as part of the pattern ... So you use this for feature extraction, and to report things that are not next expected in the normal run"

In the first (2006) & second (2016) editions of Schematron (ISO 19757-3), rules & reports can carry a flag attribute, intended "to convey state or severity information to a subsequent process" - but it is boolean. Later authors propose using the role attribute to provide more subtlety, with particular values (like "FAIL" or "WARN") - but they are not part of the standard.

In the absence of these attributes, any distinction of severity appears to be in the hands of the code which is using the rules. Hence why we have the errors & warnings separated in two rule sets.

So @petermcdaid, would you accept this suggestion being rejected?

petermcdaid commented 4 years ago

I wasnt clear that the supplemental was purely for warnings when we raised the pull request.

As long as that is clear then as one commenter says an administrator can choose how to apply the results.

In an ideal world I think it would be best that we would have asserts as errors and reports for warnings but I am by no means an expert in schematron rules or validation so am happy if the consensus is to keep it as is that the change is rejected.

PeterParslow commented 4 years ago

We have now made that distinction of purpose explicit in the GitHub page, and will make it clearer in the documentation.