Open sromkey opened 4 years ago
I see merit in requiring every file to have a checksum as it is a good way to make sure something isn't accidentally being added to the transfer that wasn't intended. Alternatively, it lets you know if you are missing something you intended to be there -- making it a manifest too. I would imagine most people would generate a checksum for all of their files if they are going to include it with a transfer. Plus, it is an optional step, so people do not have to include any if it is too much overhead for them. However, I think whatever way we go, we should make sure the documentation reflects the decision.
For those of us that want to verify every file and be notified if anything is extra or missing, we will likely use a bagit structure.
Given this I'm not as concerned about if other ways to indicate a checksum for transfers don't also check for membership mismatches.
My thinking is along the same lines at @RussellMcOrmond , that if you're truly concerned about a complete manifest, you'd be using bags.
Good point @RussellMcOrmond and @sromkey. There are some automated workflows that will pass checksums but don't create bags, so I would be interested on what they think about it. I wonder how often this issue has come up for people who supply checksums only and would be curious to know.
My primary experience has been with a workflow where the manifest itself needed to be accurate to the files transferred. In the examples I worked with, we wouldn't ask the agency to bag the item as we would work on the bag itself at the archives, though the agency could conceivably use bagit
, we'd be passing another requirement downstream when we need to make transfer as easy as possible. We also wouldn't choose rebag locally if we could where the bag is validated first then the agency's items further checked in a metadata/checksum.algorithm
file. I feel both transfer types have their merits for each being different transfer types in AM.
Internally, I am also currently working on a workflow where items are cherry-picked into a new transfer layout and I want to use the 1:1 mapping to verify that move that we're doing for the client when it is triggered.
In principle, I am not against changing the current behavior to be more flexible, but I hope we could consider providing a strict vs. non-strict mode. I'm interested in how we highlight that a partial manifest was validated to users as well.
:+1: for a strict vs. non-strict mode, that would be a good solution to this (and resolve my decision-itis, haha).
I've been testing this issue when it comes to the access derivatives workflow and thought I'd add my two cents. If you want to make use of the access workflow (or the manualNormalization workflow including pre-existing preservation derivatives) and also have existing checksums for just the originals, the transfer also fails. Of course, the workaround is to generate checksums for the derivatives and include them in the checksum sidecar file.
The example case I'm working with is when a digitization vendor provides preservation copies of digitized audio files in WAV plus access copies in MP3 that can be passed to the DIP with the access derivatives workflow. The provider also includes a metadata file with MD5 checksums for the preservation copies. A non-strict process here would probably help an archivist for whom generating additional checksums for the derivatives would add an extra step or be a bit of a hurdle. But I can also see a benefit to including checksums for all files, especially if additional preservation derivatives are also included.
^ this is a helpful comment @gehurley , thank you!
Had a similar issue as @gehurley. Would there be anyway to flag the checksum file for review when it fails instead of killing the whole process?. I'm not sure what limitations there would be around building something like this, but in the event that a checksum file is flagged because it doesn't have checksums for all the files, could users be given a few options: (1) Prompt Archivematica to update or overwrite the incorrect md5 file to include checksums for files not listed in the original MD5 file, and then generate a PREMIS event for the creation of a new, or the update of an existing md5. Or, (2) kill the operation, if it really is an issue and you want to correct it.
Expected behaviour Some use cases might require a checksum for only part of a transfer, rather than a complete manifest.
Current behaviour Verify Transfer Checksums will fail if there is not a checksum for every file in the transfer.
Steps to reproduce
Discussion I can't decide what the desired behaviour would be. On the one hand, it might be nice to check as many checksums as you have, even if it's not complete. On the other hand, users might find it deceiving to have the micro-service pass if there was not a complete list of checksums. I'm interested in hearing other opinions on this.
Your environment (version of Archivematica, operating system, other relevant details) Tested on 1.10.1.
For Artefactual use:
Before you close this issue, you must check off the following: