Closed photomedia closed 4 years ago
Rationale
It would be very useful from a management and quality assurance perspective to be able to confirm that an EPrint was succesfully exported, Archivematica picked up the transfer, and Archivematica successfully created and stored an AIP (Archival Information Package) for the transfer all from the same management screen. This would remove the need to visit two different systems to confirm an overall workflow success.
Implementation
I see two possible methods of implementing in Archivematica: 1) The Archivematica Storage Service application has in-built functionality to make REST calls to external services following certain actions (e.g. successfully storing an AIP). The Archidora Archivematica-Islandora integration, for example, makes use of this functionality to trigger actions in Islandora following the AIP storage event. 2) We could use Archivematica's automation-tools framework - which we are already planning to use in conjunction with this plugin - to send a POST request to EPrints with the relevant information following confirmation that a package has successfully been stored. This is a similar approach to the one that University of York (UK) has taken to update their researchdatayork application from Archivematica (see blog post and the status.py script they're using to accomplish this).
Either approach would require a REST endpoint in EPrints that the Archivematica Storage Service would make a request to, which would in turn update the relevant row in the archivematica table in EPrints with the AIP's UUID and/or a "success" boolean value.
Further investigation and thinking about the approaches is needed.
Storage Service documentation links
From the perspective of the rationale listed above, writing the AIP UUID from Archivematica back and a "success" status indicator to the appropriate row in EPrints would be sufficient.
The question of whether it's worth the development effort is still very open. It would be nice to hear the perspectives of other potential users of this plugin.
Tim,
I would like to throw in my support for the integration of EPrints and Archivematica. We use EPrints here to handle our Electronic Thesis and Dissertations (ETDs) and would like to have those transferred either individually or in an ongoing fashion to the Archivematica system. So from EPrints -> Archivematica would be a very nice add on to our institutional repository.
I could easily envision:
We are just looking at Archivematica here and are looking at how to tie it into our systems here and your concept hit spot on for us.
-Brian Gregg
Brian D. Gregg Solutions Architect University of Pittsburgh. University Library System.
From: Tim Walsh notifications@github.com Sent: Thursday, July 25, 2019 2:49 PM To: eprintsug/EPrintsArchivematica EPrintsArchivematica@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [eprintsug/EPrintsArchivematica] Archivematica Sending Information Back to EPrints (#10)
From the perspective of the rationale listed above, writing the AIP UUID from Archivematica back and a "success" status indicator to the appropriate row in EPrints would be sufficient.
The question of whether it's worth the development effort is still very open. It would be nice to hear the perspectives of other potential users of this plugin.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F10%3Femail_source%3Dnotifications%26email_token%3DAAC5B3URYHXOOXKZAV2WI43QBHYSRA5CNFSM4IG5BLYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD22NTRQ%23issuecomment-515168710&data=02%7C01%7Cbdgregg%40pitt.edu%7C7c392c5d61c7414c560408d71130c9bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636996773556292873&sdata=ifDqVuQwBFazi0JZ7XquswcXIY143Fw593YRjL5Fteg%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAC5B3XE4PTHLPGSR6IEWIDQBHYSRANCNFSM4IG5BLYA&data=02%7C01%7Cbdgregg%40pitt.edu%7C7c392c5d61c7414c560408d71130c9bf%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636996773556302869&sdata=0rcevBaJ8Iltej%2FwT17IU5qtktPUVa9Oqmf8RKuTEwI%3D&reserved=0.
@timothyryanwalsh
Either approach would require a REST endpoint in EPrints that the Archivematica Storage Service would make a request to, which would in turn update the relevant row in the archivematica table in EPrints with the AIP's UUID
A RESTful endpoint has been supported since EPrints 3.2. I have never used it but, based on threads from the EP-tech listserve, I have heard - brace yourself - that it isn't well documented! This historic blog post from the DepositMOre project describes what is possible though in conjunction with SWORD2. CRUD is definitely supported but one would assume that more is possible in 2019, and there are certainly institutions performing sophisticated m2m interactions with the EPrints REST endpoint, e.g. CalTech.
Would it be sufficiently useful to have an Archivematica ID (UUID?) sent back to EPrints whenever Archivematica processes it for preservation? @photomedia
I am inclined to say that it would be sufficient to have the Archivematica UUID in EPrints for each corresponding eprint. From this alone a lot can be inferred, e.g. receipt of an Archivematic UDDI cognate to confirmation of stored, construct of link to object/AIP in Archivematica using UDDI, etc.
Based on this discussion, I am adding a section in the README to specify that we need for Archivematica to send back an Archivematica UUID to EPrints, using (option 1: Archivematica Storage Service application in-built functionality to make REST calls) when it has processed the item. I think the preference would be to use a RESTful endpoint in EPrints.
After looking into this a bit further, I've confirmed that in Archivematica Storage Service 0.15+, we can add a post-store callback that would send a GET, POST, PUT, or PATCH request to EPrints containing the AIP's UUID in the body of the request. The URI that this is sent to is configurable.
The question I'm facing is: how will the Archivematica Storage Service know what the Eprints Archivematica dataset ID is, so that we can construct the right URI for the API call? Presumably we would have to include it with the transfer in some way and then figure out a way to make that information available to the Storage Service.
Screenshot to show the Edit Callback screen from the Archivematica Storage Service (qa/0.x branch as of yesterday):
The question I'm facing is: how will the Archivematica Storage Service know what the Eprints Archivematica dataset ID is, so that we can construct the right URI for the API call? Presumably we would have to include it with the transfer in some way and then figure out a way to make that information available to the Storage Service.
Yes, we will need to pass a
Body:
{AIP UUID': '
From my understanding of the link on the EPrints REST endpoint that @geo-mac shared above, I think the simplest implementation for EPrints would be to use the EPrints Archivematica dataset ID (which uniquely identifies the row in the Archivematica
table in EPrints that will store the AIP UUID and associate it to the correct EPrint) to construct the URI, and pass only the AIP UUID in the body. That way the API call from Archivematica would update the appropriate resource directly, without requiring additional logic/programming on the EPrints side.
We could instead pass the EPrints ID in the URI or body, and then add logic to EPrints to have it go find the appropriate row in the Archivematica dataset and update the UUID, as @photomedia suggests above.
Probably to figure out which is a better option for us, we need more clarity on:
@photomedia - Would you mind looking into the second question, particularly around if there is updated documentation available? Is this up to date/accurate?
We could instead pass the EPrints ID in the URI or body, and then add logic to EPrints to have it go find the appropriate row in the Archivematica dataset and update the UUID, as @photomedia suggests above.
@timothyryanwalsh , Actually, I suggested that the EPrints Archivematica dataset ID is used directly rather than the EPrint ID.
Yes, you did! My mistake!
What exactly the SWORD API endpoints available to us in EPrints are @photomedia - Would you mind looking into the second question, particularly around if there is updated documentation available? Is this up to date/accurate?
I think that @wfyson would be the best person to comment on that. Will, could you please let us know what would be the preferred way that Archivematica would send the callback request to have the Archivematica Dataset updated in EPrints with the UUID of the processed item? Should we make a CRUD request as described by the documentation here http://wiki.eprints.org/w/API:EPrints/Apache/CRUD ?
Hi @photomedia @timothyryanwalsh,
Apologies for the delay in getting back to you about this! A CRUD request as documented at the link above would be the best way to go about updating an existing record of the new Archivematica dataset that this EPrints plugin would introduce.
To do this you'd need to know the ID of the Archivematica record (i.e. the unique ID that EPrints stores for the record, not the UUID) and then we can use either a default EPrints import plugin (like the XML plugin) or a custom import plugin we could develop as part of this work to PUT an update to the archivematica record.
Expressed as a curl command it would look something like this:
curl -v -H "Content-Type: application/vnd.eprints.data+xml;" -X PUT --data-binary "@/path/to/data.xml" -u <username>:<password> http://myrepository.org/id/archivematica/<id>
We'd also need to add a UUID field to the new archivematica dataset as defined in https://github.com/eprintsug/EPrintsArchivematica/blob/master/lib/plugins/EPrints/DataObj/Archivematica.pm but that shouldn't be a problem.
I hope this helps answer your question! Let me know if there's any more information you need!
Thank you, @wfyson
To do this you'd need to know the ID of the Archivematica record (i.e. the unique ID that EPrints stores for the record, not the UUID)
Yes, so after some discussion with Archivematica / Artefactual, the simplest way for Archivematica to make the callback and include the EPrints Archivematica Dataset ID for the AIP in it is to have it included in the filename of the AIP that is passed. Archivematica would send back the full filename in the callback. Therefore, I will make a change to the spec of this export pliugin, to include it. Currently, the filename is:
repositoryid-eprintid-lastmoddate
I will change this in the spec to:
repositoryid-eprintid-lastmoddate---EPrintsArchivematicaDatasetID
https://github.com/eprintsug/EPrintsArchivematica/commit/0a8676a30a7d75f9afe023a6ccf68f6096f1f333
https://github.com/eprintsug/EPrintsArchivematica/commit/2a8e0b9dad709da3ecbed9923297c0e8f9d45fac
Following discussions with @wfyson and @timothyryanwalsh , we agreed that we will rename the top folder name of the exported AIP from EPrints to the EPrintsArchivematicaDatasetID itself. This will allow us to have a more generalized solution for the callback on Archivematica side, without the need for running split or regex expressions on the folder name to send the ID back to the submitting system. The assumption is that the ID is the folder name. Therefore, I will close this issue, and rename the top folder filename in the spec to:
EPrintsArchivematicaDatasetID
The Archivematica callback will do the following:
curl -v -H "Content-Type: application/vnd.eprints.data+xml;" -X PUT --data-binary "@/path/to/data.xml" -u <username>:<password> http://myrepository.org/id/archivematica/<id>
where <id> = EPrintsArchivematicaDatasetID = AIP folder name
and /path/to/data.xml
contains the Archivematica UUID of this AIP.
Thanks @photomedia and @wfyson - this is looking great! I think we've come to a nice solution here.
Two minor notes:
{
'AIP UUID': '<UUID>'
}
<id>
= EPrintsArchivematicaDatasetID
= transfer folder name
(the AIP folder name also contains the UUID of the AIP, appended to the end of the name; I will make sure the value that is being used to construct the URI for the PUT request from the Archivematica Storage Service does not include this UUID)Thanks!
We could also use just 'UUID' for the key in the JSON body, or whatever other value most cleanly matches to an appropriate name for that column in the EPrints Archivematica dataset table
@timothyryanwalsh Yes, it doesn't have to be data.xml, it can be data.json with the AIP UUID stored as you suggest. I made the change in the spec to JSON.
I also corrected that we are talking about the transfer folder name, not the full post-Archivematica AIP folder name. Thanks!
{
'AIP UUID': '<UUID>'
}
Yes, I am adding to to the README and closing this issue
This issue is for the question of what, if any, information should be sent back to EPrints. Would it be sufficiently useful to have an Archivematica ID (UUID?) sent back to EPrints whenever Archivematica processes it for preservation? Is this information necessary to have in EPrints? Is it worth the development effort to add this to the plugin? If so, would this information be sent back to EPrints using a SWORD call? What are the limitations/capabilities of Archivematica in this respect?