archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: DIPs are not cleaned up from watched directories #1665

Closed replaceafill closed 4 months ago

replaceafill commented 7 months ago

Expected behaviour

DIPs are always removed from watched directories after a transfer finishes.

Current behaviour

DIPs are placed in two watched directories:

In the following scenarios, at least one of these copies is left behind:

And although it's not related to watched directories there is also a conflict between rejecting the AIP and the DIP at the same time:

Steps to reproduce

  1. Set the processing configuration to create access copies.
  2. Upload a DIP to an access system.
  3. Do not store the DIP.
  4. Check the uploadDIPs and uploadedDIPs watched directories after the transfer finishes. You'll see both copies of the DIP.

Your environment (version of Archivematica, operating system, other relevant details)

https://github.com/artefactual/archivematica/commit/2a13924d724a0f9fb4297f995441d6f02368803f


For Artefactual use:

Before you close this issue, you must check off the following:

replaceafill commented 4 months ago

We tested and compared these scenarios using the am115jammy and am116jammy servers and they work as expected now.

fitnycdigitalinitiatives commented 4 months ago

Hello,

I think this has inadvertantly broken the ContentDM upload process. If the transfer completes before that transfer is initiated, the DIP stored in uploadedDIPs is deleted and thus cannot be uploaded/transfered to ContentDM.

sromkey commented 4 months ago

@fitnycdigitalinitiatives thanks for pointing this out- I can think of three workarounds:

  1. In addition to configuring Upload DIP to ContentDM, also set your processing configuration to store the DIP and retrieve it from the Storage Service.
  2. Set your process configuration to not store the AIP until you choose to do so manually, to give yourself time to retrieve the DIP.
  3. Set your process configuration to not store the DIP and then you can retrieve it from the Rejected directory (seems like kind of a hack!) I hope one of these works for you- apologies for forcing a change in your workflow. We're trying to be more consistent about the separation of concerns between the Archivematica dashboard and the Storage Service and not rely so much on leaving things in watched directories, which in our experience causes performance and clean up problems. This also reminds me that we should update the docs to reflect this, so thanks for that also!
fitnycdigitalinitiatives commented 4 months ago

Thanks Sarah, I did actually end up just tweaking our automation script to grab it from the rejected directory. Everything works the same in the end and is it a bit tidier with this workflow.