Closed sallain closed 1 week ago
@sallain I originally planned to try and merge the SFA Arelda metadata into the METS XML as a proper XML document with one root node and proper namespacing. I see now that SFA would like the Arelda metadata first in the document, and I've also realized that adding the Arelda XML inside the METS XML is going to be quite a bit of work. So, I've settled for now on just concatenating the two XML files with the Arelda first and the METS second. It's a work in progress (still needs testing) but I think the concatenation code should work now: https://github.com/artefactual-sdps/preprocessing-sfa/tree/dev/issue-77-combine-ais-metadata
Attached is a zipped AIS package created by Enduro with the combined AIS metadata file. search-md_little_digitized_sip-15da98b9-5953-46dd-8dc9-2b31ee544bff.zip
Note that the current name of the AIS metadata file is "AIS_1974_47_3578513" with no file extension. From the description above I think that's what the filename should be, but let me know if I should and an extension (e.g. ".xml").
Results as expected!
Is your feature request related to a problem? Please describe.
DPS must deliver both the METS file and the metadata.xml/UpdatedAreldaMetadata.xml file to the AIS during the post-preservation workflow. However, AIS only expects one file.
Describe the solution you'd like
Combine the METS and the metadata.xml/UpdatedAreldaMetadata.xml files together into one metadata file. For migration files (files identified as DigitizedAIP or BornDigitalAIP), UpdatedAreldaMetadata.xml should be used.
The newly created file should be named with the prefix
AIS_
followed by the accession number, which can be found in the metadata.xml (or UpdatedAreldaMetadata.xml, but should be the same value) under<ablieferungsnummer>
. There should only be one ablieferungsnummer per metadata file. The number is formatted as2002/05
; the / should be replaced with an _. The final file name will beAIS_2002_05
.Within the file, SFA would like the contents of metadata.xml/UpdatedAreldaMetadata.xml first, since it contains the higher hierarchies, and then the METS. The contents of the two files should probably be tagged in some way but I think it can be pretty simple - perhaps just indicating the source file.
Describe alternatives you've considered
None
Additional context
There's a very real chance that, when operating at scale, the resulting file will be too big for AIS to handle; it might make sense to then limit which fields from each file we're combining into this new file. But we'll tackle that if/when it happens.