eprintsug / EPrintsArchivematica

Digital Preservation through EPrints-Archivematica Integration - An EPrints export plugin to Archivematica
6 stars 1 forks source link

apostrophe, double-quote, colon in file name of deposited file causes an error #40

Closed photomedia closed 2 years ago

photomedia commented 2 years ago

Another instance of an error is if there's an apostrophe in the file name of a deposited file, then we get this:

To reproduce the error, upload a file with an apostrophe in the name, for example: l'Universite_Concordia.pdf

We need to escape any apostrophes in the copy calls.

photomedia commented 2 years ago

Investigating this further, testing with the file 'collectivités_traumatisé's.pdf'

the copy operation fails on this line: https://github.com/eprintsug/EPrintsArchivematica/blob/56fb33acdd2bb10005085cb6a6c1fe555298752b/lib/plugins/EPrints/Plugin/Export/Archivematica/EPrint.pm#L118

The error is caught, but the details are not displayed, tracing the error, the details are actually: Copy failed: No such file or directory*

It fails because the file is not found.

The file is not found because when I look at what is actually stored in that location on disk, it is this: 'collectivités_traumatisé=0027s.pdf'

So does EPrints substitute apostrophes in file names with "=0027" when it saves them to disk?

photomedia commented 2 years ago

added a regex to catch this transformation of filename on disk, and more details on the error log if/when copy operation fails again. this resolved the issue on my repository.

photomedia commented 2 years ago

will include this fix in next release 1.2.3

photomedia commented 2 years ago

The double-apostrophe causes the same failure. To reproduce, deposit a file with this filename: As"ad.pdf On disk, the " is replaced by =0022

photomedia commented 2 years ago

Another special character that gets replaced, in addition to quote, double-quote, colon. So we have this: =003a replacing any : in a filename. Need to include that in the REGEX.