Open ablwr opened 5 years ago
For testing, University of Washington graciously provided us with two test files. Same image, with one having had the Exiftool data stripped out of it (which also shrunk the image size in half).
PH1486ColmanJ_028.jpg_original.jpeg and PH1486ColmanJ_028-repaired.jpg
Expected behaviour
The Characterize & Extract microservice should run successfully on each file, performing each specified extraction tool.
Current behaviour
Raw JPEG files with a lot of metadata create very big Exiftool files. I guess 12,736 lines of XML is too much for Archivematica to handle. But instead of failing in a known or elegant way, the hefty file fails at some point (I am guessing at the point of writing to the database), and it not only causes itself to fail, but it causes adjacent files to "fail" the Characterize & Extract microservice.
The result is that the C&E microservice does not float errors to the top, and the Exit Code for these and other (random, changing, inconsistent) files to be marked as
None
.I noted one difference between the outputs in the ExifTool versions in the
System:FileModifyDate
and related fields, but I think that difference is inconsequential. The lines of XML are the same.Steps to reproduce
Get a file with a whole lot of metadata. I can provide a sample privately, with approval from the client encountering this issue.
Your environment (version of Archivematica, OS version, etc) CentOS 1.9.2
I can replicate this error on Bionic 1.9.0 and Xenial 1.9.0, and 1.10 versions as well. I cannot replicate on Dockerized Bionic 1.10x, which indicates to me that this can likely be fixed in some sort of deployment configuration setting.
For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle: