archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: DSpace transfers fail checksum check if "Delete packages after extraction" is set to "Yes" #437

Open sallain opened 5 years ago

sallain commented 5 years ago

Expected behaviour If a DSpace transfer has sidecar material (i.e. research data), it is exported from DSpace as a .zip file within the transfer directory. Using Archivematica, I should be able to extract this package and then delete the package after extraction in order to save space in my AIP.

Current behaviour If the processing configuration is set to the following:

The transfer fails at Job: Verify checksums in fileSec of DSpace METS files. This is because this job uses the checksums provided by DSpace to verify the files, but the .zip file no longer exists. The transfer fails with the following message:

Traceback (most recent call last):
  File "/usr/lib/archivematica/MCPClient/clientScripts/verifyChecksumsInFileSecOfDspaceMETSFiles.py", line 72, in <module>
    ret = verifyMetsFileSecChecksums(metsFile, date, taskUUID, relativeDirectory=os.path.dirname(metsFile) + "/")
  File "/usr/lib/archivematica/MCPClient/clientScripts/verifyChecksumsInFileSecOfDspaceMETSFiles.py", line 49, in verifyMetsFileSecChecksums
    checksum2 = get_file_checksum(fileFullPath, checksumType)
  File "/usr/lib/archivematica/archivematicaCommon/archivematicaFunctions.py", line 175, in get_file_checksum
    with open(filename, 'rb') as f:
IOError: [Errno 2] No such file or directory: '/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/examineContentsChoice/my-transfer-here/path/researchdata.zip'

Steps to reproduce Start a DSpace transfer that contains a .zip file, and make sure that the processing config is set as above.

Your environment (version of Archivematica, OS version, etc) All


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

sallain commented 5 years ago

Some more information about this failure. It occurs because there is microservice for DSpace transfers called Microservice: Identify DSpace files that runs a special checksum check job using the DSpace-generated checksums. Moving this microservice so that it happens before Extract packages would prevent this failure from occurring.

image

Alternately, the checksum check could be configured to ignore the missing .zip file (or files), though this is a more dangerous route I think.

ross-spencer commented 5 years ago

Related, we had a similar issue with Dataverse, and opted for the second approach, but primarily due to a lack of time at the end of the project and the amount of scope we had for change. For the reader, the PR containing that change is here. The original issue here.

ablwr commented 5 years ago

Testing this on 1.10x, this no longer seems to be an issue. Can I close this? Or perhaps someone else can test and verify that it's not a problem anymore.

sallain commented 5 years ago

@ablwr I don't think any of our DSpace transfers contain .zip files. The DSpace sample transfer is a bit misleading - each of the .zip files is a DSpace package, but they don't contain packages themselves. It's the nested packages that are the issue (sorry, it's confusing!)

I will try to get my hands on some client test data to check this on 1.10.

ablwr commented 5 years ago

Ohhhhh I see, I see. OK!