DILCISBoard / E-ARK-CSIP

E-ARK Common Specification for Information Packages
http://earkcsip.dilcis.eu
Creative Commons Attribution 4.0 International
11 stars 5 forks source link

Casing incongruity for "Other" content category type value #722

Open Sunday-Crunk opened 8 months ago

Sunday-Crunk commented 8 months ago

The format for the content category type value "Other" is inconsistent between both the specification, fixed vocabularies, METS profile and validators. I believe the ultimate source of confusion is that the fixed content category vocabulary contains an entry for "Other", but the specification explicitly calls for "OTHER", making it unclear whether they are intended to mean the same or different things.

CSIP2

The 'mets/@TYPE' attribute MUST be used to declare the category of the content held in the package, e.g. "Datasets", "Websites", "Mixes" , "Other", etc.. Legal values are defined in a fixed vocabulary. When the content category used falls outside of the defined vocabulary the 'mets/@TYPE' value must be set to "OTHER" and the specific value declared in 'mets/@csip:OTHERTYPE'. The vocabulary will develop under the curation of the DILCIS Board as additional content information type specifications are produced.

In this case, literally interpreting the text:

"When the content category used falls outside of the defined vocabulary the 'mets/@TYPE' value must be set to "OTHER"

Suggests the values "Other" and "OTHER" serve different purposes ("Other" is a legal term in the fixed vocabulary, as explicated by the first portion of CSIP2), but this isn't explicitly substantiated in any further requirement descriptions.

CSIP3

The is compounded by CSIP3, which proceeds to stick with the all caps casing:

"When the 'mets/@TYPE' attribute has the value "OTHER" the 'mets/@csip:OTHERTYPE' attribute MUST be used to declare the content category of the package/representation. The value can either be "OTHER" or any other string that are not present in the vocabulary used in the 'mets/@TYPE' attribute."

METs profile

The CSIP METs profile XPATHs also enforce all caps:

mets[@TYPE='OTHER']/@csip:OTHERTYPE

Validators

Commons IP

Whereas the commons IP validator appears to use the "Other" format, which assumes the specification is in error in all instances where "OTHER" is used and that "Other" (from the fixed vocabulary) is the only valid format, as you can see from the following mets and validation result stubs:

Using type "OTHER":

<?xml version="1.0" encoding="utf-8"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:csip="https://DILCIS.eu/XML/METS/CSIPExtensionMETS" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.w3.org/1999/xlink http://www.loc.gov/standards/mets/xlink.xsd https://DILCIS.eu/XML/METS/CSIPExtensionMETS https://earkcsip.dilcis.eu/schema/DILCISExtensionMETS.xsd" PROFILE="https://earksip.dilcis.eu/profile/E-ARK-SIP.xml" TYPE="OTHER" csip:OTHERTYPE="MIXED">

This is a particularly frustrating manifestation of this incongruity since the precise resolution delineated by the error is in direct contradiction with the input that precipitated it :

[
  {
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP2",
    "name" : "Content Category",
    "location" : "mets/@TYPE",
    "description" : "The mets/@TYPE attribute MUST be used to declare the category of the content held in the package, e.g. book, journal, stereograph, video, etc.. Legal values are defined in a fixed vocabulary. When the content category used falls outside of the defined vocabulary the mets/@TYPE value must be set to 'OTHER' and the specific value declared in mets/@csip:OTHERTYPE. The vocabulary will develop under the curation of the DILCIS Board as additional content information type specifications are produced.",
    "cardinality" : "0..1",
    "level" : "MUST",
    "testing" : {
      "outcome" : "FAILED",
      "issues" : [ "Value OTHER is not valid in /home/cameron/mets-generator/samples/a46ab3d®-c710-4d73-b58d-e9330b53a82/representations/rep2/METS.xml.", "Value OTHER is not valid in /home/cameron/mets-generator/samples/a46ab3d®-c710-4d73-b58d-e93e30b53a82/representations/rep1/METS.xml.", "Value OTHER is not valid in Root METS-xml." ],
      "warnings" : [1],
      "notes" : []
    }
  }, 
  {
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP3",
    "name" : "Other Content Category",
    "location" : "mets [@TYPE='OTHER']/@csip:OTHERTYPE",
    "description" : "When the mets/@TYPE attribute has the value 'OTHER' the mets/@csip:OTHERTYPE attribute MUST be used to declare the content category of the package/representation.",
    "cardinality" : "0..1",
    "level" : "SHOULD",
    "testing" : {
      "outcome" : "PASSED",
      "issues" : [ ],
      "warnings" : [ ],
      "notes" : [ ]
    }
  }
]

Successfully using type 'Other':

<?xml version='1.0' encoding='utf-8'?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:csip="https://DILCIS.eu/XML/METS/CSIPExtensionMETS" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" csip:CONTENTINFORMATIONTYPE="MIXED" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.w3.org/1999/xlink http://www.loc.gov/standards/mets/xlink.xsd https://DILCIS.eu/XML/METS/CSIPExtensionMETS https://earkcsip.dilcis.eu/schema/DILCISExtensionMETS.xsd" OBJID="a46ab3d0-c710-4d73-b58d-e9330b53a82" LABEL="CSIP Information Package" PROFILE="https://earksip.dilcis.eu/profile/E-ARK-SIP.xml" TYPE="Other" csip:OTHERTYPE="MIXED">

Result:

[
  {
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP2",
    "name" : "Content Category",
    "location" : "mets/@TYPE",
    "description" : "The mets/@TYPE attribute MUST be used to declare the category of the content held in the package, e.g. book, journal, stereograph, video, etc.. Legal values are defined in a fixed vocabulary. When the content category used falls outside of the defined vocabulary the mets/@TYPE value must be set to 'OTHER' and the specific value declared in mets/@csip:OTHERTYPE. The vocabulary will develop under the curation of the DILCIS Board as additional content information type specifications are produced.",
    "cardinality" : "0.1",
    "level" : "MUST",
    "testing" : {
      "outcome" : "PASSED",
      "issues" : [ ],
      "warnings" : [ ],
      "notes" : [ ]
    }
  },{
    "specification" : "CSIP-2.1.0",
    "id" : "CSIP3",
    "name" : "Other Content Category",
    "location" : "mets [@TYPE='OTHER']/@csip :OTHERTYPE",
    "description" : "When the mets/@TYPE attribute has the value 'OTHER' the mets/@csip:OTHERTYPE attribute MUST be used to declare the content category of the package/representation.",
    "cardinality" : "0..1",
    "level" : "SHOULD",
    "testing" : {
      "outcome" : "PASSED",
      "issues" : [ ],
      "warnings" : [ ],
      "notes": []
    }
  }
]

The commons IP validator implements the fixed type vocabulary, so the only possible valid values for the TYPE attribute are expressed in the fixed content category vocab, with "Other" filling the intended role of "Not listed".

Python validator

The Python validator expresses CSIP2 and 3 with the following unimplemented schematron rules, using the "OTHER" format:

<assert id="CSIP2" role="ERROR" test="@TYPE">The mets/@TYPE attibute [SIC] MUST be used to declare the category of the content held in the package, e.g. book, journal, stereograph, video, etc.. Legal values are defined in a fixed vocabulary.</assert>
<assert id="CSIP3" role="WARN" test="(@TYPE = 'OTHER' and @csip:OTHERTYPE) or @TYPE != 'OTHER'">When the content category used falls outside of the defined vocabulary the mets/@TYPE value must be set to “OTHER” and the specific value declared in mets/@csip:OTHERTYPE. The vocabulary will develop under the curation of the DILCIS Board as additional content information type specifications are produced.</assert>

Though because these rules do not implement the fixed vocabulary, you would be able to use any string for the type value, so "Other" with no csip:OTHERTYPE would be valid, but we can still see the intention of the current test.

Both validators have a case for the correct approach, but as demonstrated by commons IP any validator that directly references both the vocabulary and the requirement text in its validation implementation will necessarily produce the misleading "Other" error.

I'm not sure what the original intent of CSIP2 and 3 were, so I cannot confidently assert whether the solution is to unify the format of the "Other" type, or which format is the correct one to adhere to. As you can see, a resolution on this could necessitate changes to several different documents and programmes maintained by different stakeholders.