artefactual-sdps / preprocessing-sfa

preprocessing-sfa is an Enduro preprocessing workflow for SFA SIPs
2 stars 0 forks source link

Feature: capture PREMIS events for pre-ingest validation tasks #19

Open sallain opened 3 months ago

sallain commented 3 months ago

Is your feature request related to a problem? Please describe.

SFA SIPs will have some custom ingest validation tasks in their workflow that will: 

At present, none of these pre-ingest validation tasks are generating PREMIS events. This card aims to change that where possible.

In some cases it would be best to create PREMIS events at the package level (such as for the transfer structure validation) - since AM can't do right now, we will focus only on those events we can add at the file level. For now, we will focus on generating a validation event for each file in a package once it has been checked against the allowed file formats list during the ingest validation phase. 

Describe the solution you'd like

Generate file-level PREMIS events where possible, and include them as part of a new  well-formed premis.xml file.

The first candidate is file format validation.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Previously, we were generating a premis.xml file by combining individual PREMIS files found within the content directory. This work is being undone by #18.

sallain commented 3 months ago

Annotated PREMIS file for multiple objects: premis-annotated-multi.zip

sallain commented 3 months ago

In the sample PREMIS event, @fiver-watson provided a generic linkingAgentIdentifierValue of <premis:linkingAgentIdentifierValue>https://github.com/artefactual-sdps/preprocessing-base</premis:linkingAgentIdentifierValue> on the principle that this kind of validation is likely to be universally useful. However, since it's currently implemented for SFA only through the child workflow, I think I would recommend pointing to this repo as the Agent (that is, the child workflow is the agent).

sallain commented 2 months ago

The structure of the PREMIS file looks good. However, I'm seeing two issues:

  1. Archivematica isn't able to load the events from the premis.xml due to incorrect associations between events and objects.

Image

Looking at the PREMIS file that's generated, I can see that there are five objects in the package. I can see that there are six format validation events (there should be five, I think - not sure what's going on there). However, all of the objects are linked to just one of those events, rather than each object being linked to a separate event.

A similar issue happens with the structure validation and metadata validation events, except in those cases there are only one of each event. There needs to be one event for each object (even though that doesn't make sense, I know!)

  1. The value for <premis:eventType> needs to adhere to the PREMIS data dictionary, and the eventDetail and eventOutcomeDetailNote should provide more information. The correct values are:

Let me know if a mock-up of the premis.xml would be helpful.

mcantelon commented 2 months ago

PR ready for CR: https://github.com/artefactual-sdps/preprocessing-sfa/pull/31

sallain commented 1 month ago

@mcantelon Archivematica is throwing up the following error - I don't really understand what it means!

'UUID' object has no attribute 'replace'Traceback (most recent call last):
  File "/usr/lib/archivematica/MCPClient/client/job.py", line 142, in JobContext
    yield
  File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 848, in call
    job.set_status(main(job))
  File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 839, in main
    save_events(valid_events, file_queryset, job.pyprint)
  File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 695, in save_events
    event["event_id"] = ensure_event_id_is_uuid(event["event_id"], printfn)
  File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 670, in ensure_event_id_is_uuid
    uuid.UUID(event_id, version=4)
  File "/usr/lib64/python3.9/uuid.py", line 174, in __init__
    hex = hex.replace('urn:', '').replace('uuid:', '')
AttributeError: 'UUID' object has no attribute 'replace'
mcantelon commented 2 weeks ago

PR to fix issues: https://github.com/artefactual-sdps/preprocessing-sfa/issues/19

mcantelon commented 2 weeks ago

Fix merged! :crossed_fingers: