archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: It isn't easy to create reusable code in mets-reader-writer (metsrw) to use only the (user) needed PREMIS containers #1304

Open ross-spencer opened 4 years ago

ross-spencer commented 4 years ago

Please describe the problem you'd like to be solved

Creating a PREMIS event might be done as follows:

    premis_data = (
        "event",
        PREMIS_META,
        (
            "event_identifier",
            ("event_identifier_type", ID_TYPE),
            ("event_identifier_value", event.event_id),
        ),
        ("event_type", event.event_type),
        ("event_date_time", event.event_datetime),
        ("event_detail_information", ("event_detail", event.event_detail)),
        (
            "event_outcome_information",
            ("event_outcome", event.event_outcome),
            (
                "event_outcome_detail",
                ("event_outcome_detail_note", event.event_outcome_detail),
            ),
        ),
    )
    for agent in event.agents.all():
        premis_data += (
            (
                "linking_agent_identifier",
                ("linking_agent_identifier_type", agent.identifiertype),
                ("linking_agent_identifier_value", agent.identifiervalue),
            ),
        )

    for linking_object_uuid in linking_object_uuids:
        premis_data += (
            (
                "linkingObjectIdentifier",
                ("linking_object_identifier_type", ID_TYPE),
                ("linking_object_identifier_value", linking_object_uuid),
                ("linking_object_role", SOURCE_ROLE),
            ),
        )

    return metsrw.plugins.premisrw.data_to_premis(
        premis_data, premis_version=PREMIS_META["version"]
    )

That's going to satisfy me for pretty much every event so I can reuse this. But events use containers differently, a very rough summary of the ones I've audited (in Archivematica) that have empty containers (which are not mandatory in the PREMIS schema) look as follows:

Ingestion
---------

    <premis:eventDetailInformation>
      <premis:eventDetail></premis:eventDetail>
    </premis:eventDetailInformation>

    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>
      <premis:eventOutcomeDetail>
        <premis:eventOutcomeDetailNote></premis:eventOutcomeDetailNote>
      </premis:eventOutcomeDetail>
    </premis:eventOutcomeInformation>

Registration
------------

    <premis:eventDetailInformation>
      <premis:eventDetail></premis:eventDetail>
    </premis:eventDetailInformation>            

    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>
      <premis:eventOutcomeDetail>
        <premis:eventOutcomeDetailNote>accession#DemoCSV1</premis:eventOutcomeDetailNote>
      </premis:eventOutcomeDetail>
    </premis:eventOutcomeInformation>    

Fixity check
------------

  <premis:eventOutcomeDetail>
    <premis:eventOutcomeDetailNote></premis:eventOutcomeDetailNote>
  </premis:eventOutcomeDetail>    

Metadata extraction
-------------------

    <premis:eventDetailInformation>
      <premis:eventDetail></premis:eventDetail>
    </premis:eventDetailInformation>
    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>
      <premis:eventOutcomeDetail>
        <premis:eventOutcomeDetailNote>"METS-tools.15e219c3-0f51-4d32-80f4-577edfeceb05.xml#xpointer(id('techMD_1').xml"</premis:eventOutcomeDetailNote>
      </premis:eventOutcomeDetail>
    </premis:eventOutcomeInformation>  

Name cleanup
------------

    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>    

Normalization
-------------

    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>      

Creation
--------

    <premis:eventDetailInformation>
      <premis:eventDetail></premis:eventDetail>
    </premis:eventDetailInformation>
    <premis:eventOutcomeInformation>
      <premis:eventOutcome></premis:eventOutcome>
      <premis:eventOutcomeDetail>
        <premis:eventOutcomeDetailNote></premis:eventOutcomeDetailNote>
      </premis:eventOutcomeDetail>
    </premis:eventOutcomeInformation>

As a user, I might want to conditionally output containers based on whether they are used or not but the nesting of tuples makes it a) difficult to construct conditionally, and b) difficult to filter the structure once created without more thorough processing.

Describe the solution you'd like to see implemented

I want to be able to signal to metsrw that I don't want something. So to be able to either construct a condensed PREMIS representation more easily and with less code repetition, or ask metsrw to optionally output a condensed PREMIS representation as required.

Describe alternatives you've considered

For now, folks can use the verbose constructor to achieve this, but from what I can see, where the nesting gets deeper and more complex, parts of functions will need to be duplicated in other helper methods which is fairly redundant and a little less clean to write.

Additional context

Related https://github.com/artefactual-labs/mets-reader-writer/issues/43


For Artefactual use:

Before you close this issue, you must check off the following:

ross-spencer commented 4 years ago

The situation might not be as dire as described, but I'm not sure. This is one approach that works (it took some thinking backwards and less linearly):

    if event.event_outcome_detail or event.event_outcome:
        if event.event_outcome_detail:
            detail = ("event_outcome_detail",("event_outcome_detail_note", event.event_outcome_detail),)
            try:
                event_outcome_info += detail
            except UnboundLocalError:
                event_outcome_info = detail
        if event.event_outcome:
            detail = ("event_outcome", event.event_outcome)
            try:
                event_outcome_info += detail
            except UnboundLocalError:
                event_outcome_info = detail
        premis_data += (("event_outcome_information", event_outcome_info),)
    premis_data = (
        "event",
        PREMIS_META,
        (
            "event_identifier",
            ("event_identifier_type", ID_TYPE),
            ("event_identifier_value", event.event_id),
        ),
        ("event_type", event.event_type),
        ("event_date_time", event.event_datetime),
    )

    if event.event_detail:
        premis_data += (
            ("event_detail_information", ("event_detail", event.event_detail)),
        )

    if event.event_outcome_detail or event.event_outcome:
        if event.event_outcome_detail:
            detail = ("event_outcome_detail",("event_outcome_detail_note", event.event_outcome_detail),)
            try:
                event_outcome_info += detail
            except UnboundLocalError:
                event_outcome_info = detail
        if event.event_outcome:
            detail = ("event_outcome", event.event_outcome)
            try:
                event_outcome_info += detail
            except UnboundLocalError:
                event_outcome_info = detail
        premis_data += (("event_outcome_information", event_outcome_info),)

    for agent in event.agents.all():
        premis_data += (
            (
                "linking_agent_identifier",
                ("linking_agent_identifier_type", agent.identifiertype),
                ("linking_agent_identifier_value", agent.identifiervalue),
            ),
        )

    for linking_object_uuid in linking_object_uuids:
        premis_data += (
            (
                "linkingObjectIdentifier",
                ("linking_object_identifier_type", ID_TYPE),
                ("linking_object_identifier_value", linking_object_uuid),
                ("linking_object_role", SOURCE_ROLE),
            ),
        )

    return metsrw.plugins.premisrw.data_to_premis(
        premis_data, premis_version=PREMIS_META["version"]
    )

Ultimately it might still be better to teach metsrw to understand empty or null values as instructions to leave a field out, or some other solution as described above.

sevein commented 4 years ago

Relates to https://github.com/archivematica/Issues/issues/743.