Problem: Change name microservice gets confused with similar names with diacritics (Café and Café) causing a trickle down effect until METS generation failure #1352
We can deconstruct them to their UTF8 component parts and see the two 'e' letters with diacritics are created using different symbols:
* LATIN SMALL LETTER E + COMBINING ACUTE ACCENTs
* LATIN SMALL LETTER E WITH ACUTE
This is a transfer ready-made which will cause the transfer process to fail once extracted and run: cafe_fail.zip
Steps to reproduce
Run the two files in the zip above as a transfer. It will fail during METS generation in the ingest tab as two results are returned from the database for the same file: MultipleObjectsReturned: get() returned more than one File -- it returned 2!
If you have a look through the different microservice jobs too you will see various different database failures, e.g. in file format identification one of the file object: DoesNotExist: File matching query does not exist.
Your environment (version of Archivematica, operating system, other relevant details)
Archivematica 1.12. Docker, and at a client site.
Additional context
The change name microservice seems to be creating this issue as we know that diacritics are not preserved on the file system and instead are maintained in the database and metadata.
By the time these files are processed and written to the database they end up with the same current location, even though other properties are different (nb, the pipe alignment in the cells below haven't been introduced by me).
I have tracked this down as much as I have energy for this evening and it seems the correct data all exists up until here in the workflow at which point the dictionary lookup fails for the information the microservice job is trying to find).
Expected behaviour
Current behaviour
These two names look the same:
But in hex they look as follows:
We can deconstruct them to their UTF8 component parts and see the two 'e' letters with diacritics are created using different symbols:
This is a transfer ready-made which will cause the transfer process to fail once extracted and run: cafe_fail.zip
Steps to reproduce
Run the two files in the zip above as a transfer. It will fail during METS generation in the ingest tab as two results are returned from the database for the same file:
MultipleObjectsReturned: get() returned more than one File -- it returned 2!
If you have a look through the different microservice jobs too you will see various different database failures, e.g. in file format identification one of the file object:
DoesNotExist: File matching query does not exist.
Your environment (version of Archivematica, operating system, other relevant details)
Archivematica 1.12. Docker, and at a client site.
Additional context
The change name microservice seems to be creating this issue as we know that diacritics are not preserved on the file system and instead are maintained in the database and metadata.
By the time these files are processed and written to the database they end up with the same current location, even though other properties are different (nb, the pipe alignment in the cells below haven't been introduced by me).
I have tracked this down as much as I have energy for this evening and it seems the correct data all exists up until here in the workflow at which point the dictionary lookup fails for the information the microservice job is trying to find).
Here is some logging I put in to confirm this:
For Artefactual use:
Before you close this issue, you must check off the following: