eprintsug / EPrintsArchivematica

Digital Preservation through EPrints-Archivematica Integration - An EPrints export plugin to Archivematica
6 stars 1 forks source link

process_transfers throws error on files with some accented letters in the file name #38

Closed photomedia closed 2 years ago

photomedia commented 2 years ago

The comparison of existing hashes fails if the deposited file contains some accented characters, for example:

collectivités_traumatisés.pdf

To reproduce the error, upload/deposit any file with the above name, and try to process_transfers with it.

The failure report will look something like this:

...
* [1] Copy - '/opt/eprints3/archives/.../collectivités_traumatisés.pdf' '/mnt/auto-transfers/.../objects/documents/documentid-XXXX/files/XXXXX/collectivités_traumatisés.pdf'
...
* [0] ERROR - checksum MISSING  - /mnt/auto-transfers/.../objects/documents/documentid-XXX/files/XXXX/collectivités_traumatisés.pdf
PROBLEM DETECTED WITH 6426 CANCELLING

Clearly, there is an encoding issue interfering with recognizing the existing hash.

The fix will probably be needed in this file: lib/plugins/EPrints/Plugin/Export/Archivematica/EPrint.pm

photomedia commented 2 years ago

Confirmed the bug on another development repo. Changing one line resolves the issue, this line: https://github.com/eprintsug/EPrintsArchivematica/blob/b309ece59555eba51bbe1a1922e3c8b1305c976a/lib/plugins/EPrints/Plugin/Export/Archivematica/EPrint.pm#L452 Changed to this resolves it: push @$file_paths, decode( 'utf8', $path );

photomedia commented 2 years ago

Added a fix for this and included in first official release version 1.2.2