Closed sallain closed 1 week ago
Note: I've only listed validating against the schema as a first iteration. Other checks might include:
Note additionally that this is something that will be used repeatedly for any Enduro user performing custom ingest activities that might generate PREMIS, and/or anyone submitting their own PREMIS files with a SIP. For this reason, ideally this will be implemented as a reusable temporal activity, rather than a client-specific child workflow.
@mcantelon also, as discussed in the meeting today:
Let's make this a general "Validate XML" task for its first pass, that can accept both a file to validate and a schema file to use for the validation as inputs.
There are some comments about this issue in https://github.com/artefactual-sdps/preprocessing-sfa/issues/22#issuecomment-2223129249.
PREMIS file looks good and it's being properly parsed into the METS. I think we can finally put this issue to bed!
Is your feature request related to a problem? Please describe.
Whether generated by Enduro (through a child workflow) or included in a SIP, PREMIS XML files should be validated before the package is sent to preservation. Archivematica/a3m can parse a PREMIS file to add the file's events to the AIP METS, which happens quite late in the AM/a3m workflow - ensuring that the PREMIS file is valid will hopefully avoid errors at this late point.
Describe the solution you'd like
Add a new activity to validate the premis.xml file against the PREMIS v3 schema before sending to AM/a3m, ensuring that it's well-formed and valid.
PREMIS files generated by Enduro child workflows should always be validated. A PREMIS file included in a transfer may have been validated in advance, so it might not be necessary to validate these. A reasonable approach might be to validate any PREMIS file in the SIP's
metadata
directory, regardless of origin, as this is the file that will be picked up by Archivematica/a3m.Describe alternatives you've considered
None
Additional context