eprintsug / EPrintsArchivematica

Digital Preservation through EPrints-Archivematica Integration - An EPrints export plugin to Archivematica
6 stars 1 forks source link

checksum mismatch on derivative file: indexcodes.txt #41

Closed photomedia closed 2 years ago

photomedia commented 2 years ago

It is unclear what to do when we encounter a checksum mismatch on the indexcodes.txt derivative file. This is not a bug in the plugin, more of a question to EPrints community: what can we do, ideally using existing epadmin commands, to reshash an indexcodes.txt file?

photomedia commented 2 years ago

I have added a related issue on the EPrints core: https://github.com/eprints/eprints3.4/issues/201 and the great news is that @drn05r proposed an enhancement to epadmin that would allow us to regenerate a hash for a file. Thank you!

photomedia commented 2 years ago

An enhancement that emerged from this is that the plugin should include in the processing log the fileid of each file, including the derivative files. This would allow us to quickly know the fileid of any file that throws a mismatch. The fileid is useful for troubleshooting, for example, for running the epadmin redo_hash function on the file.

photomedia commented 2 years ago

It looks like the error with this specific indexcodes.txt file is caused by an incorrect filesize in the Eprints file object for this file. If the filesize is incorrect, less than the actual filesize, the checksum stored will also be incorrect.