artefactual-sdps / preprocessing-sfa

preprocessing-sfa is an Enduro preprocessing workflow for SFA SIPs
1 stars 0 forks source link

Feature: Add full support for VECTEUR SIPs #23

Closed sallain closed 5 months ago

sallain commented 6 months ago

Is your feature request related to a problem? Please describe.

In PoC#1, we added support for transforming VECTEUR AIPs into Archivematica AIPs, including running some initial SFA-custom validation on the transfers in pre-ingest, ~combining the included PREMIS files into a single file~, restructuring the transfer, and sending it along to Archivematica in a bag. 

In PoC#2, we are now adding a similar - yet somewhat different - transfer type: VECTEUR SIPs. At a high level, the difference is: 

A full breakdown in the differences in structure and files between the Vecteur SIPs and AIPs can be seen in Miro here: https://miro.com/app/board/uXjVMlYiVgs=/?moveToWidget=3458764589942310730&cot=14

Thanks to some serendipitous confusion in PoC#1, it turns out that our current pre-ingest tasks already mostly work on Vecteur SIPs: 

However, the restructuring tasks done on the Vecteur AIP to remove unneeded files and directories (described in this PoC#1 card) do NOT currently work for Vecteur SIPs. 

This card will be to perform those outstanding cleanup tasks, and generally make sure that Vecteur SIPs are properly parsed into AM-compliant SIPs before being sent along to Archivematica.

Describe the solution you'd like

During pre-ingest, just before bagging and renaming the transfer for submission to Archivematica:

Describe alternatives you've considered

None

Additional context

NOTE: I've left references to the "combine PREMIS files" work that was done previously, but has been undone by #18 - I'm not sure if any related work here might have traces of that feature, so it's just a reminder that we're not doing that anymore.

jraddaoui commented 5 months ago

Move the following files to a "metadata" directory: header/metadata.xml, content/d_0000001/Prozess_Digitalisierung_PREMIS.xml

For both Vecteur AIPs and SIPs, considering the example above, the Prozess_Digitalisierung_PREMIS.xml will be renamed to Prozess_Digitalisierung_PREMIS_d_0000001.xml in the metadata directory. This is done to reduce the likelihood of file conflicts when there is more than one entry in the contents directory.

sallain commented 5 months ago

@jraddaoui Makes sense to me!

fiver-watson commented 5 months ago

Confirmed that Vecteur SIP tests seem work, that metadata.xml file is moved, that Prozess file is moved and renamed!

sallain commented 5 months ago

@jraddaoui Is this issue ready to close?

jraddaoui commented 5 months ago

I'd say so.

sallain commented 5 months ago

Vecteur SIPs are being identified as expected