DILCISBoard / E-ARK-CSIP

E-ARK Common Specification for Information Packages
http://earkcsip.dilcis.eu
Creative Commons Attribution 4.0 International
11 stars 5 forks source link

PROPOSAL: improve encoding of xpath and cardinaly #707

Open carlwilson opened 12 months ago

carlwilson commented 12 months ago

Currently, the METS Profiles for the E-ARK specifications utilise some "free form" XHTML fields allowed in requirements descriptions to encode information used in the published specifications. Specifically the XPath expression for the element/attribute and the "Cardinality", i.e. how many occurrences are permissible. These are recorded using adapted dictionary term (<dt>) and dictionary definition (<dd>) pairs so:

<requirement ID="SIP19" REQLEVEL="MAY" EXAMPLES="metsHdrElementExample1 metsHdrAgentExample2">
    <description>
        <head>Submitting agent additional information</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The submitting agent has a note providing a unique identification code for the archival creator.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>metsHdr/agent/note</dd>
            <dt>Cardinality</dt><dd>0..1</dd>
        </dl>
    </description>
</requirement>

The METS Profile schema makes specific allowances for recording automated tests for a requirement with an XML bias. The tests element is a container for test elements that can be used to record XPath tests, and indeed the Schematron validation rules. Using some of the CSIP elements as examples:

<requirement ID="CSIP10" REQLEVEL="MUST">
    <description>
        <head>Agent</head>
        <p xmlns="http://www.w3.org/1999/xhtml">A mandatory agent element records the software used to create the package. Other uses of agents may be described in any local implementations that extend the profile.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>mets/metsHdr/agent</dd>
            <dt>Cardinality</dt><dd>1..n</dd>
        </dl>
    </description>
</requirement>
<requirement ID="CSIP11" REQLEVEL="MUST" EXAMPLES="metsHdrElementExample1">
    <description>
        <head>Agent role</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element MUST have a `@ROLE` attribute with the value “CREATOR”.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>mets/metsHdr/agent[@ROLE='CREATOR']</dd>
            <dt>Cardinality</dt><dd>1..1</dd>
        </dl>
    </description>
</requirement>

These could be encoded so:

<requirement ID="CSIP10" REQLEVEL="MUST">
    <description>
        <head>Agent</head>
        <p xmlns="http://www.w3.org/1999/xhtml">A mandatory agent element records the software used to create the package. Other uses of agents may be described in any local implementations that extend the profile.</p>
    </description>
    <tests>
        <test ID="TEST10-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']</testXML>
            </testWrap>
        </test>
        <test ID="TEST10-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr">
                    <iso:assert id="CSIP10" role="ERROR" test="count(mets:agent)&gt;=1">The metsHdr element MUST contain an agent element that records the software used to create the package.</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
    </tests>
</requirement>
<requirement ID="CSIP11" REQLEVEL="MUST" EXAMPLES="metsHdrElementExample1">
    <description>
        <head>Agent role</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element MUST have a `@ROLE` attribute with the value “CREATOR”.</p>
    </description>
    <tests>
        <test ID="TEST11-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR"]</testXML>
            </testWrap>
        </test>
        <test ID="TEST11-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr">
                    <iso:assert id="CSIP11" role="ERROR" test="count(mets:agent[@ROLE = 'CREATOR']=1">The agent element MUST have a ROLE attribute with the value "CREATOR".</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
    </tests>
</requirement>
carlwilson commented 12 months ago

Note that the actual requirement and cardinality might be more succinctly put as:

  <testXML>
      <iso:rule context="/mets:mets/mets:metsHdr">
      <iso:assert id="CSIP10" role="ERROR" test="count(mets:agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE'])=1">The metsHdr element MUST contain an agent element that records the software used to create the package.</iso:assert>
      </iso:rule>
  </testXML>

This might help avoid the ambiguity expressed in #705 where the note element XPath is explicit

  <requirement ID="CSIP16" REQLEVEL="MUST" RELATEDMAT="VocabularyNoteType" EXAMPLES="metsHdrElementExample1">
      <description>
          <head>Classification of the agent additional information</head>
          <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element's note child has a `@csip:NOTETYPE` attribute with a fixed value of "SOFTWARE VERSION".</p>
      </description>
...
        <test ID="TEST16-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']/note[@csip:NOTETYPE='SOFTWARE VERSION']</testXML>
            </testWrap>
        </test>
        <test ID="TEST16-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr/mets:agent[@ROLE = 'CREATOR' and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']">
                    <iso:assert id="CSIP16" role="ERROR" test="count(mets:note[@csip:NOTETYPE='SOFTWARE VERSION']=1">The mandatory agent element’s note child has a @csip:NOTETYPE attribute with a fixed value of “SOFTWARE VERSION”.</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
...
  </requirement>
karinbredenberg commented 12 months ago

The issue is going to be discussed by the DILCIS Board

jmaferreira commented 11 months ago

This issue was discussed on the DILCIS Board (2023-09-13).

It is not clear how to cardinality information will be rendered on the output document. The cardinallity used to be concise and explicitly written on the profile. When we change it to testWrapper it is not clear how the same output (simple and human readible) is going to be produced.

Can we mantain both options in the same requirement element? One would serve the user documentation and the other would serve the automatic validation machine.

@carlwilson Can you provide more information on this?

stephenmackey commented 9 months ago

I understand the point made by @jmaferreira but from the profile creation and maintenance perspective having two different ways of encoding the same information in a single profile for each requirement is very problematic. I would also like the project managers to comment on the availability of time on each specification for re-working the mets profiles, this seems like a lot of unplanned work.

jmaferreira commented 9 months ago

I agree... having two ways of encoding the same thing is a risk. If the testWrapper can be used to produce the previous human readable output, I'm all for it.

stephenmackey commented 9 months ago

Agreed.

karinbredenberg commented 9 months ago

@carlwilson could you add some clarifications?

carlwilson commented 8 months ago

Hi @jmaferreira you've noticed the possible issue. My belief is that I can derive the cardinality for publication on the website and PDF from the recorded tests. I'll admit I think that there might be an issue or two with this IRL. If that's the case we will retain the cardinality mark-up so that it's explicit. I don't think that changing the form of the XPath and adding the Schematron rules are a problem. @stephenmackey is also working on this.

karinbredenberg commented 7 months ago

The suggestion is:

Board members acknowledgment of the issue: Tick the box in front of you name to indicate that you have looked at the suggestion.

Voting (Decision making will be carried out on the basis of majority voting by all eligible members of the Board. In the case of a tied vote, decisions will be made at the discretion of the Chair)

Tick the box in front of you name to say yes to the suggestion.

karinbredenberg commented 6 months ago

7 DILCIS Board members have acknowledge the issue 7 DILCIS Board members agree with the solution

The suggestion of updated encoding will be part of the next release of the specifications