Closed photomedia closed 3 years ago
Actually, small correction to this issue description: the integer comes from a DOI field, which was set to a number in a test record. A real DOI would have some punctuation, so it wouldn't generate this issue, but still, there might very well sometime be an "identifier" in the metadata that is just a number.
To patch this up on our (EPrints) side, I have modified our DC export so that the id_number is only exported (as "DC relation") when it matches the DOI regex. This means that a pure integer should not be getting exported, avoiding the exposing of this weakness in the Archivematica import scripts. In most repositories (including Concordia), id_number is supposed to hold a DOI anyway. At Concordia we actually have a separate field for other non-DOI identifiers (not exported with DC). That means this change will only serve to limit the export of unexpected identifiers.
After switching from the EPrints to the Perl JSON encoder, all values are exported in quotes, even integer values. This means this issue is resolved, but the code shouldn't assume every multi-value field is a string.
I just opened an issue at Archivematica about this: https://github.com/archivematica/Issues/issues/1462 The metadata.json file can include fields that look like this: [ { "dc.relation": [ "https:\/\/someurl.ca\/99", "some other identifier", 824675433 ] }] That integer value is not in quotes, which causes a failure on import there. I might try to patch our export so that the identifier is listed in double quotes, even though it is an integer.