Open jraddaoui opened 6 months ago
@jraddaoui I agree it would be better to allow different transfer types to be sent to Archivematica, but in the current processing workflow the bundle activity will convert an incoming Bag transfer into a standard transfer which is then zipped and sent to AM (or a3m). Allowing a Bag transfer to be sent to Archivematica will require removing the bundle activity from the AM workflow or updating it to support multiple output transfer types.
Note: the conversion of Bags -> standard transfer is a decision that was made for the a3m preservation engine, and I decided to retain this convention when adding Archivematica as a preservation engine option.
I'll create another issue talking about that bundle activity, this is all looking forward to have an extensible pre-processing option and it will help if we have a child workflow for those activities later on. Then we should discuss where should the bundle activity be located (if needed), looking at the conceptual design bundling seems like a responsibility for pre-processing. And in the SFA fork we are skipping the bundle activity right now.
@jraddaoui okay, but I don't see any point in making the AM transfer type configurable without addressing bundle activity - Enduro will always deliver a zipped standard transfer to AM. In the SFA case you've already modified the Enduro code, so just changing the transfer type in the code is a simpler solution then adding a config variable.
Note from today's meeting: @djjuhasz, @jraddaoui, and @sallain to review this issue and decide what pieces of work need to be completed to support SFA and MoMA.
I have a proposal for how to handle the SIP format delivered to the preservation system by Enduro. My proposal is based on the supposition that a BagIt Bag is the best SIP format for Enduro to send to the preservation system, but recognizes that a3m currently can't process Bagged SIPs.
I believe a BagIt Bag should be the preferred SIP format because:
My proposed solution for the Enduro SIP type
@sallain @jraddaoui what do you think? If you have a counter-proposal or any suggested modifications to my proposal, I'd love to hear your ideas.
I think that this is a good idea for the following reasons:
I also completely agree that this should all occur in pre-processing.
Here are a few things to consider:
I'm sure that there are other considerations as well, but for the most part I think that this is a solid proposal.
I'm sure that there are other considerations as well, but for the most part I think that this is a solid proposal.
I would like to outline one of the considerations that is missing here. That consideration is that our current way of validating bags uses a very early, and not well tested bagit library in go. see https://github.com/nyudlts/go-bagit and https://github.com/nyudlts/go-bagit/issues/7#issuecomment-1613190552. It would require some work to make this fully featured and complaint bag validator according to spec.
@sallain I agree that we should avoid rebagging a transfer that is submitted as a Bag and that adding Bag processing to a3m ASAP would avoid having to unbag the bag we just bagged. :P
@Diogenesoftoronto yes, good points about the https://github.com/nyudlts/go-bagit library. I was assuming we would use https://github.com/LibraryOfCongress/bagit-python for Bag validation, but it being a Python tool definitely makes it more challenging to integrate than a native Go library. It also looks like bagit-python is not being actively maintained, and requires Python 2 which was sunset in January 2020.
I was discussing this with @fiver-watson and he pointed out that there may be circumstances where a user submits a bag, but other activities in the pre-processing application mean that the original bag is invalid (ex. transforming or adding metadata files), meaning that the bag WOULD have to be rebagged. Just something to consider.
We spent some time last week hashing out a workflow diagram. This is the result. It can be found on the Implementation Services team Miro board
@sallain the workflow diagram looks good to me. :+1:
Is your feature request related to a problem? Please describe.
Currently, all transfers started in Archivematica use the
zipfile
transfer type:https://github.com/artefactual-sdps/enduro/blob/main/internal/am/start_transfer.go#L49
This is not an issue in the current implementation where the transfer is always bundled as a ZIP file. However, it limits the extensibility of the workflow; thinking in the particular case of the SFA fork, where the transfer is transformed into a zipped bag in the pre-processing activities:
https://github.com/artefactual-sdps/enduro-sfa/pull/4/files#diff-ae98fc39bbc9e053ec8d1d2ed56184cd9ba7ea280d3e72975617da81c3cfadd3
Describe the solution you'd like
Provide a configuration setting like the one used for the processing configuration:
https://github.com/artefactual-sdps/enduro/blob/main/enduro.toml#L99
Describe alternatives you've considered
Allow changing the transfer type value in workflow. Thinking about the possibility of using child workflows to manage that extensibility, another option could be to indicate the transfer type in the child workflow result.