Closed sallain closed 4 weeks ago
Annotated PREMIS file for multiple objects: premis-annotated-multi.zip
In the sample PREMIS event, @fiver-watson provided a generic linkingAgentIdentifierValue of <premis:linkingAgentIdentifierValue>https://github.com/artefactual-sdps/preprocessing-base</premis:linkingAgentIdentifierValue>
on the principle that this kind of validation is likely to be universally useful. However, since it's currently implemented for SFA only through the child workflow, I think I would recommend pointing to this repo as the Agent (that is, the child workflow is the agent).
The structure of the PREMIS file looks good. However, I'm seeing two issues:
Looking at the PREMIS file that's generated, I can see that there are five objects in the package. I can see that there are six format validation events (there should be five, I think - not sure what's going on there). However, all of the objects are linked to just one of those events, rather than each object being linked to a separate event.
A similar issue happens with the structure validation and metadata validation events, except in those cases there are only one of each event. There needs to be one event for each object (even though that doesn't make sense, I know!)
<premis:eventType>
needs to adhere to the PREMIS data dictionary, and the eventDetail
and eventOutcomeDetailNote
should provide more information. The correct values are:eventType: validateStructure SHOULD BE validation
eventType: validateFileFormats SHOULD BE validation
eventType: validateMetadata SHOULD BE validation
Let me know if a mock-up of the premis.xml would be helpful.
PR ready for CR: https://github.com/artefactual-sdps/preprocessing-sfa/pull/31
@mcantelon Archivematica is throwing up the following error - I don't really understand what it means!
'UUID' object has no attribute 'replace'Traceback (most recent call last):
File "/usr/lib/archivematica/MCPClient/client/job.py", line 142, in JobContext
yield
File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 848, in call
job.set_status(main(job))
File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 839, in main
save_events(valid_events, file_queryset, job.pyprint)
File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 695, in save_events
event["event_id"] = ensure_event_id_is_uuid(event["event_id"], printfn)
File "/usr/lib/archivematica/MCPClient/clientScripts/load_premis_events_from_xml.py", line 670, in ensure_event_id_is_uuid
uuid.UUID(event_id, version=4)
File "/usr/lib64/python3.9/uuid.py", line 174, in __init__
hex = hex.replace('urn:', '').replace('uuid:', '')
AttributeError: 'UUID' object has no attribute 'replace'
PR to fix issues: https://github.com/artefactual-sdps/preprocessing-sfa/issues/19
Fix merged! :crossed_fingers:
Is your feature request related to a problem? Please describe.
SFA SIPs will have some custom ingest validation tasks in their workflow that will:
At present, none of these pre-ingest validation tasks are generating PREMIS events. This card aims to change that where possible.
In some cases it would be best to create PREMIS events at the package level (such as for the transfer structure validation) - since AM can't do right now, we will focus only on those events we can add at the file level. For now, we will focus on generating a validation event for each file in a package once it has been checked against the allowed file formats list during the ingest validation phase.
Describe the solution you'd like
Generate file-level PREMIS events where possible, and include them as part of a new well-formed premis.xml file.
The first candidate is file format validation.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Previously, we were generating a premis.xml file by combining individual PREMIS files found within the content directory. This work is being undone by #18.