TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

objectIdentifier_minimal test is both complex and unnecessary #2260

Closed sydb closed 2 years ago

sydb commented 2 years ago

The report test of the "objectIdentifier_minimal" constraint reads

not(count(*) gt 0)

Well, since the result of a count() cannot be negative (or fractional), saying “is not greater than zero” is the same as saying “is zero”, which would seem to be a lot simpler. So either

<sch:report test="count(*) eq 0">     OR     <sch:assert test="count(*) ne 0">

would be a better way to express this, I think. But more importantly, the content model of <objectIdentifier> is

( model.placeNamePart | institution | repository | collection | idno | msName | objectName | altIdentifier | address )*

If we just change that * (zero or more) at the end to + (one or more), the entire constraint specification could be dropped, no?

ebeshero commented 2 years ago

Yes—seems funny that the constraint is there at all! I wonder if we have more Schematron tests like that trying to count more than zero children?

sydb commented 2 years ago

At first blush, no:

      2 count(*)>1
      1 count(descendant::tei:lg|descendant::tei:l|descendant::tei:gap) > 0
      1 count(tei:depth)> 1
      1 count(tei:height)> 1
      1 count(tei:lem) &lt; 2
      1 count(tei:width)> 1
      1 not(count(*) gt 0)                                    # <== this one is the one this ticket is about
      1 @target and count( child::* ) > 0

(Just looking for //tei:constraint//@test[ contains( ., 'count') ]. The 1st column there is the number of occurrences in p5subset.xml.)

sydb commented 2 years ago

@jamescummings — Is there a strong reason to keep the Schematron constraint rather than rely on the content model, here? After all, they are slightly different. Besides the fact that the Schematron gives a mildly more informative message (not a remotely convincing argument IMHO), the Schematron does allow

<objectIdentifier>
  <kml:Placemark xmlns:kml="http://www.opengis.net/kml/2.2">
    <kml:name>Sea Organ</kml:name>
    <kml:description>
      an architectural sound art object located in Zadar, Croatia and an experimental musical instrument,
      which plays music by way of sea waves and tubes located underneath a set of large marble steps.
      — Wikipedia
    </kml:description>
    <kml:Point>
      <kml:coordinates>15.22, 44.117222, 0</kml:coordinates>
    </kml:Point>
  </kml:Placemark>
</objectIdentifier>

whereas the proposed content model would not. But in order to allow foreign elements (in this case KML, because I could not figure out the right encoding for a location in Dublin Core) the user would need to customize with an ODD anyway (or pre-process the document before validation, or maybe use NVDL).

jamescummings commented 2 years ago

@jamescummings — Is there a strong reason to keep the Schematron constraint rather than rely on the content model, here?

Apologies, @sydb, didn't see this until now. There was someone when it was created that was suggesting objectIdentifier should be allowed to be empty with just a corresp attribute for some reason. That is why the constraint was done as a poorly constructed schematron. (To make it easier to ignore) But I think that was a hypothetical suggestion in any case. I've no objections to just making the content model + rather than *