archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: metsrw.premisrw dynamic accessor returns unexpected values (mets-reader-writer) #743

Open sevein opened 5 years ago

sevein commented 5 years ago

Given the following PREMIS event:

<mets:digiprovMD ID="digiprovMD_435" CREATED="2019-06-13T16:33:56">
  <mets:mdWrap MDTYPE="PREMIS:EVENT">
    <mets:xmlData>
      <premis:event xmlns:premis="http://www.loc.gov/premis/v3" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
        <premis:eventIdentifier>
          <premis:eventIdentifierType>UUID</premis:eventIdentifierType>
          <premis:eventIdentifierValue>None</premis:eventIdentifierValue>
        </premis:eventIdentifier>
        <premis:eventType>name cleanup</premiet mo:eventType>
        <premis:eventDateTime>2019-06-13 16:33:48.266892+00:00</premis:eventDateTime>
        <premis:eventDetailInformation>
          <premis:eventDetail>prohibited characters removed: program="sanitize_names"; version="1.10.db4f6aca278e6daf2af160ed40349baf7c6f53af"</premis:eventDetail>
        </premis:eventDetailInformation>
        <premis:eventOutcomeInformation>
          <premis:eventOutcome/>
          <premis:eventOutcomeDetail>
            <premis:eventOutcomeDetailNote/>
          </premis:eventOutcomeDetail>
        </premis:eventOutcomeInformation>
        <premis:linkingAgentIdentifier>
          <premis:linkingAgentIdentifierType>preservation system</premis:linkingAgentIdentifierType>
          <premis:linkingAgentIdentifierValue>Archivematica-1.10</premis:linkingAgentIdentifierValue>
        </premis:linkingAgentIdentifier>
        <premis:linkingAgentIdentifier>
          <premis:linkingAgentIdentifierType>repository code</premis:linkingAgentIdentifierType>
          <premis:linkingAgentIdentifierValue>test</premis:linkingAgentIdentifierValue>
        </premis:linkingAgentIdentifier>
        <premis:linkingAgentIdentifier>
          <premis:linkingAgentIdentifierType>Archivematica user pk</premis:linkingAgentIdentifierType>
          <premis:linkingAgentIdentifierValue>1</premis:linkingAgentIdentifierValue>
        </premis:linkingAgentIdentifier>
      </premis:event>
    </mets:xmlData>
  </mets:mdWrap>
</mets:digiprovMD>

If premisrw is used to access premis:eventOutcomeDetailNote using the following form:

item.event_outcome_detail_note

The value returned is:

(('event_outcome_detail_note',),)

However, if there was at least one premis:eventOutcomeDetailNote populated, the same accessor would return the value of the first match.

I find this behaviour to be counterintuitive - but it's also not clear what would be the best solution. A change could break backward compatibility but would be that a big problem? Who's relying on the returned value when there is a mismatch? Should the accessor raise an error? Return a falsy value?

Your environment (version of Archivematica, OS version, etc) Since metsrw.premisrw was conceived?


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ross-spencer commented 4 years ago

Investigating this for some METS parsing work it is counter-intuitive. A potential pattern for some which might not break in the future is as follows (with additional annotation):

    for fsentry in mets.all_files():
        for premis_event in fsentry.get_premis_events():
            something_interesting = None
            detail = premis_event.event_detail
            if not isinstance(detail, tuple):
                """handle string

                Usually the string we're interested in, e.g. AV output.
                """
                something_interesting = detail
            """Otherwise, we have a tuple, which looks as follows, and
            we can ignore:

               detail <-- tuple
               detail[0] <-- abstract base class
               detail[0].data <-- abstract base class data (tuple)
               detail[0].data[0], e.g. field_name e.g. `event detail`

            """
            if something_interesting is not None:
                do_something_interesting()