eprintsug / EPrintsArchivematica

Digital Preservation through EPrints-Archivematica Integration - An EPrints export plugin to Archivematica
6 stars 1 forks source link

Export folder naming according to custom AMID syntax #43

Closed geo-mac closed 2 years ago

geo-mac commented 2 years ago

A use case has arisen in my discussion with other institutions, some of which are considering their implementation of the Archivematica Integration plugin. This use case presents a possible future enhancement and may benefit some organizations. The use case concerns the deployment of multiple repositories feeding content to the same Archivematica pipeline. The problem scenario might look like the following:

  1. An organization with 2 or more EPrints repositories, each managing different content (e.g. research publications, theses, research data, open education resources, etc.).
  2. The organization has decided to use a single pipeline for, say, the publication and the thesis repositories.
  3. Each of the repositories has the Archivematica Integration plugin installed and begins exporting data objects for Archivematica to ingest.
  4. The consequence of 2 repositories exporting content simultaneously and in isolation results in an AMID overlap in folder naming in the Archivematica pipeline (i.e. both repo 1 and 2 will likely have folders named with the same AMID), causing confusion in Archivematica ingest and issues with the ongoing management of data objects in the pipeline.

One solution is to simply customize how the folder naming is handled within the plugin. A local AMID will always be created locally and stored in the DB, however it should be possible to customise the AMID in folder naming when it is exported for Archivematica ingest. Eprint.pm controls the export of data objects so theoretically it is possible to modify this but is something cleaner necessary -- or is it enough to provide some documentation on how best to accommodate this use case by modifying Eprint.pm...?

photomedia commented 2 years ago

Hi @geo-mac , thank you for this. I understand exactly the issue raised here, and it should be easily solvable by simply adding a prefix to the folder names? The prefix string could be a configurable option. For example: "SPECTRUM" Important point would be that the prefix would only contain letters, no digits or underscore. Then all of the folders would get exported as: SPECTRUM-AMID (SPECTRUM-1, SPECTRUM-2, SPECTRUM-3, etc.) I believe that this would also be helpful in double-checking that the service callback is for the correct repository (related issue: https://github.com/eprintsug/EPrintsArchivematica/issues/26) The callback already uses a REGEX to extract the digits from the AMID string sent back: https://github.com/eprintsug/EPrintsArchivematica/blob/53b4a0e39721e81864a26a7762951668fac56dd2/cgi/archivematica/set_uuid#L48 and to ignore anything after _ because Archivematica uses that sometimes for versioning https://github.com/eprintsug/EPrintsArchivematica/blob/53b4a0e39721e81864a26a7762951668fac56dd2/cgi/archivematica/set_uuid#L47 If you agree that this is a good solution, I will look how to add it; it shouldn't be too difficult. It would also create the opportunity to add that extra check that if the amid-prefix is defined, it should match, otherwise, the set_uuid call should be ignored as it relates to a different repo.

geo-mac commented 2 years ago

it should be easily solvable by simply adding a prefix to the folder names?

Yes, this was exactly my thinking too -- great minds think alike! ;-) Although you have been able develop the idea much further and use to address #26 which is terrific. Excellent thinking as always @photomedia

It is not a feature we are likely to use but, from my discussions with other UK repos, it sounds like some institutions may find it to be an essential feature.

photomedia commented 2 years ago

@geo-mac I have added this functionality with these commits: https://github.com/eprintsug/EPrintsArchivematica/commit/bc71c9564a09a94fde1a8c3000935a438341b51e https://github.com/eprintsug/EPrintsArchivematica/commit/6f4d4da6287687412001b2d6e80be0b4b370fce2 Using an optional "transfer_prefix" configuration. This also allows us to double-check if the AM service callback is for the correct prefix (Issue https://github.com/eprintsug/EPrintsArchivematica/issues/26) .
Added some documentation on this here: https://github.com/eprintsug/EPrintsArchivematica#transfer-folder-prefix