artefactual-labs / mets-reader-writer

Library to parse and create METS files, especially for Archivematica.
https://mets-reader-writer.readthedocs.io
GNU Affero General Public License v3.0
20 stars 13 forks source link

Problem: XML Schema Location isn't output when creating a PREMIS event #44

Closed ross-spencer closed 6 years ago

ross-spencer commented 6 years ago

If we do something along the lines of:

def generate_event():
    # Add some new EVENTS to our METS
    return ('event', ('event_identifier', 
            ('event_identifier_type', "UUID"), 
            ('event_identifier_value', uuid.uuid4())), 
            ('event_type', "AM CAMP DEMO"), 
            ('event_date_time', datetime.now().isoformat()), 
            ('event_detail', "Adding new PREMIS EVENTS"), 
            ('event_outcome_information', ('event_outcome', "SUCCESS"), 
                                          ('event_outcome_detail', 
                                          ('event_outcome_detail_note', 
                                           "dag iedereen!"))), 
            ('linking_agent_identifier', 
            ('linking_agent_identifier_type', "python script"), 
            ('linking_agent_identifier_value', "1.0")))

print lxml.etree.tostring(premisrw.data_to_premis(generate_event()), 
                          pretty_print=True)

The output is as follows:

<premis:event xmlns:premis="info:lc/xmlns/premis-v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <premis:eventIdentifier>
    <premis:eventIdentifierType>UUID</premis:eventIdentifierType>
    <premis:eventIdentifierValue>8557f11f-a4c0-447d-9f01-cdae5e41e535</premis:eventIdentifierValue>
  </premis:eventIdentifier>
  <premis:eventType>AM CAMP DEMO</premis:eventType>
  <premis:eventDateTime>2018-04-05T13:44:17.283713</premis:eventDateTime>
  <premis:eventDetail>Adding new PREMIS EVENTS</premis:eventDetail>
  <premis:eventOutcomeInformation>
    <premis:eventOutcome>SUCCESS</premis:eventOutcome>
    <premis:eventOutcomeDetail>
      <premis:eventOutcomeDetailNote>dag iedereen!</premis:eventOutcomeDetailNote>
    </premis:eventOutcomeDetail>
  </premis:eventOutcomeInformation>
  <premis:linkingAgentIdentifier>
    <premis:linkingAgentIdentifierType>python script</premis:linkingAgentIdentifierType>
    <premis:linkingAgentIdentifierValue>1.0</premis:linkingAgentIdentifierValue>
  </premis:linkingAgentIdentifier>
</premis:event>

Which, when we add this to an existing METS document and validate against our schematron file, will result in:

Error: A digiprovMD mdWrap element MUST contain an XML schema location.

and related:

Unless MDTYPE is OTHER an mdRef element MUST contain an XML schema location.

example METS output:

    <mets:digiprovMD ID="digiprovMD_231587" CREATED="2018-04-05T11:12:21">
      <mets:mdWrap MDTYPE="PREMIS:EVENT">
        <mets:xmlData>
          <premis:event xmlns:premis="info:lc/xmlns/premis-v2">
            <premis:eventIdentifier>
              <premis:eventIdentifierType>UUID</premis:eventIdentifierType>
              <premis:eventIdentifierValue>65c8369f-2b47-49e0-be0c-08da6bbd8b24</premis:eventIdentifierValue>
            </premis:eventIdentifier>
            <premis:eventType>AM CAMP DEMO</premis:eventType>
            <premis:eventDateTime>2018-04-05T13:12:21.128197</premis:eventDateTime>
            <premis:eventDetail>Adding new PREMIS EVENTS</premis:eventDetail>
            <premis:eventOutcomeInformation>
              <premis:eventOutcome>SUCCESS</premis:eventOutcome>
              <premis:eventOutcomeDetail>
                <premis:eventOutcomeDetailNote>dag iedereen!</premis:eventOutcomeDetailNote>
              </premis:eventOutcomeDetail>
            </premis:eventOutcomeInformation>
            <premis:linkingAgentIdentifier>
              <premis:linkingAgentIdentifierType>python script</premis:linkingAgentIdentifierType>
              <premis:linkingAgentIdentifierValue>1.0</premis:linkingAgentIdentifierValue>
            </premis:linkingAgentIdentifier>
          </premis:event>
        </mets:xmlData>
      </mets:mdWrap>
    </mets:digiprovMD>

I believe we need xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-2.xsd" to persist into the output, as per the comment: https://github.com/artefactual-labs/mets-reader-writer/blob/master/metsrw/plugins/premisrw/premis.py#L586

jrwdunham commented 6 years ago

@ross-spencer you have to supply the XML attributes of the root <premis:event> element manually, made easier by using premisrw.PREMIS_META:

>>> premisrw.PREMIS_META
{'xsi:schema_location': 'info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-2.xsd', 
 'version': '2.2'}

Then it should work:

>>> def generate_event_with_meta():
...     # Add some new EVENTS to our METS
...     return ('event',
...             premisrw.PREMIS_META,
...             ('event_identifier',
...                 ('event_identifier_type', "UUID"),
...                 ('event_identifier_value', uuid.uuid4())),
...             ('event_type', "AM CAMP DEMO"),
...             ('event_date_time', datetime.now().isoformat()),
...             ('event_detail', "Adding new PREMIS EVENTS"),
...             ('event_outcome_information',
...                 ('event_outcome', "SUCCESS"),
...                 ('event_outcome_detail',
...                     ('event_outcome_detail_note',
...                         "dag iedereen!"))),
...             ('linking_agent_identifier',
...                 ('linking_agent_identifier_type', "python script"),
...                 ('linking_agent_identifier_value', "1.0")))
>>> print(lxml.etree.tostring(premisrw.data_to_premis(generate_event_with_meta()), pretty_print=True).decode('utf8'))
<premis:event xmlns:premis="info:lc/xmlns/premis-v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-2.xsd" version="2.2">
  <premis:eventIdentifier>
    <premis:eventIdentifierType>UUID</premis:eventIdentifierType>
    <premis:eventIdentifierValue>1665f868-dff5-4e66-8a32-f38ad7582309</premis:eventIdentifierValue>
  </premis:eventIdentifier>
  <premis:eventType>AM CAMP DEMO</premis:eventType>
  <premis:eventDateTime>2018-04-09T11:53:15.846384</premis:eventDateTime>
  <premis:eventDetail>Adding new PREMIS EVENTS</premis:eventDetail>
  <premis:eventOutcomeInformation>
    <premis:eventOutcome>SUCCESS</premis:eventOutcome>
    <premis:eventOutcomeDetail>
      <premis:eventOutcomeDetailNote>dag iedereen!</premis:eventOutcomeDetailNote>
    </premis:eventOutcomeDetail>
  </premis:eventOutcomeInformation>
  <premis:linkingAgentIdentifier>
    <premis:linkingAgentIdentifierType>python script</premis:linkingAgentIdentifierType>
    <premis:linkingAgentIdentifierValue>1.0</premis:linkingAgentIdentifierValue>
  </premis:linkingAgentIdentifier>
</premis:event>
jrwdunham commented 6 years ago

@ross-spencer can we close this or is it still an issue?

ross-spencer commented 6 years ago

@jrwdunham I apologise. I didn't see your response above. I am happy that this isn't an issue based on that. I will get back to my work with metsrw closer to June before the camp in Baltimore. We can reopen this if need be but it sounds unlikely.