Open McFateM opened 7 years ago
Documenting findings thus far... As expected the REST ingester will indeed create a specific object if that object does not already exist in the repository, provided you specify a full-PID instead of a namespace like so: "-n test:12345".
Thanks very much @McFateM. I'll play with this over the weekend to better understand how it works and document it in the README.
As stated in #2, currently we can only ingest new objects. This is done via an HTTP POST operation. If we want to replace an existing object entirely or partially, we'll need to add the the ability to PUT its constituent parts. As far as I know, everything about a Fedora object can be updated except its PID. Using Islandora REST's list of features, we'll need to be able to:
Restoring an object from a Bag, for example, should be fairly simple, since all we need to do is use whatever is in the Bag with whatever its original source was in the object we're restoring. That assumes that the Bag is complete, and that any property information we might need to "restore" is there. Bags don't include many object properties, such as owner, state, and they don't contain any datastream properties either. But we can probably add those to Bags if we want.
Let me think about how the current Ingester script is structured so we can build all this in. Isse #1 is also relevant to this.
This all sounds very good. In my case the foxml.xml file inside each bag does contain all of the necessary restore info as far as I can tell. Most exists within the elements subordinate to foxml:objectProperties. One example from my test object...
<?xml version="1.0" encoding="UTF-8"?>
<foxml:digitalObject VERSION="1.1" PID="grinnell:99"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
@McFateM I tried ingesting an object using foxml and it didn't work as expected. Here's what I did:
-n
option.The ingester interpreted the 'foxml.xml' file as a datastream, and added it as such; in other words, it didn't parse out any information from it. Maybe I didn't replicate all your steps; if not, please let me know.
As far as I can tell, the Islandora REST module doesn't support ingesting objects from FOXML, which would mean that the Islandora REST Ingester can't, since it uses that API. But, adding the ability to do that would be a great idea. I assume that DGI would be open to a PR that would add that feature. Let me investigate that.
Yep, FOXML is one of the “datastreams” that my –b option explicitly ignores, so when I ran the REST ingester in that mode it would build my target object, but as you indicated, the specifics contained in FOXML like object owner, creation date, etc., were lost so the “restored” object was not an exact copy of the Bagit content.
From: Mark Jordan notifications@github.com<mailto:notifications@github.com> Reply-To: mjordan/islandora_rest_ingester reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, October 7, 2017 at 11:44 AM To: mjordan/islandora_rest_ingester islandora_rest_ingester@noreply.github.com<mailto:islandora_rest_ingester@noreply.github.com> Cc: Mark McFate mcfatem@grinnell.edu<mailto:mcfatem@grinnell.edu>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [mjordan/islandora_rest_ingester] Add ability to replace / update an object by specifying a full PID (#3)
@McFateMhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mcfatem&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=n9lM32ut_vMWmYg1AYUK7LrIlFWgksmQq77vRzqLuFw&s=C5B_vNYiCPpJsTOTHPkae7snmVhjUBtIPy4Uk8VysEs&e= I tried ingesting an object using foxml and it didn't work as expected. Here's what I did:
The ingester interpreted the 'foxml.xml' file as a datastream, and added it as such; in other words, it didn't parse out any information from it. Maybe I didn't replicate all your steps; if not, please let me know.
As far as I can tell, the Islandora REST module doesn't support ingesting objects from FOXML, which would mean that the Islandora REST Ingester can't, since it uses that API. But, adding the ability to do that would be a great idea. I assume that DGI would be open to a PR that would add that feature. Let me investigate that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Frest-5Fingester_issues_3-23issuecomment-2D334948723&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=n9lM32ut_vMWmYg1AYUK7LrIlFWgksmQq77vRzqLuFw&s=J61AZqiD6MrASXvoahMb5bdyHHskllaJEaz41IQ7wpk&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIFIwdjjmkWJN7NiNZvDNZqc5nS4Qm60ks5sp6p-2DgaJpZM4PwsZT&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=n9lM32ut_vMWmYg1AYUK7LrIlFWgksmQq77vRzqLuFw&s=vumVQc2tsybhXkoOrkSMOpDxMpK-8-D-UOt9-SNYBzU&e=.
I've confirmed that you can reuse PIDs, which means that if you want to restore an object, you need to first delete the original object if it still exists and then POST the backup using the same PID. (If the old object still existed, you'd need to PUT its parts, you can POST it.)
With regard to the FOXML, how about this: if the file 'foxml.xml' is present in the data directory, it is not ingested as a datastream (because it is not a datastream), but instead, is parsed for the object owner, label (the only two properties supported by the REST module's POST method; we can't POST the object's state, but we could perform a secondary PUT to change it). For datastreams, we can parse out and POST the controlGroup, label, state, mimeType, checksumType, and its versionable property.
That's the best we can do. We wouldn't have versioned data (e.g. older versions of datastreams) but it would be pretty close.
Precisely what I had in mind Mark.
I’ve just completed the recovery of our repository and have things in much better shape now, but am looking at migrating the repository to a new server this evening if possible. So I can’t easily contribute to the project at this time, but if all goes well, later this week I should be able to return my attention here and assist if necessary. Take care.
From: Mark Jordan notifications@github.com<mailto:notifications@github.com> Reply-To: mjordan/islandora_rest_ingester reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, October 9, 2017 at 1:16 PM To: mjordan/islandora_rest_ingester islandora_rest_ingester@noreply.github.com<mailto:islandora_rest_ingester@noreply.github.com> Cc: Mark McFate mcfatem@grinnell.edu<mailto:mcfatem@grinnell.edu>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [mjordan/islandora_rest_ingester] Add ability to replace / update an object by specifying a full PID (#3)
I've confirmed that you can reuse PIDs, which means that if you want to restore an object, you need to first delete the original object if it still exists and then POST the backup using the same PID. (If the old object still existed, you'd need to PUT its parts, you can POST it.)
With regard to the FOXML, how about this: if the file 'foxml.xml' is present in the data directory, it is not ingested as a datastream (because it is not a datastream), but instead, is parsed for the object owner, label (the only two properties supported by the REST module's POST method; we can't POST the object's state, but we could perform a secondary PUT to change it). For datastreams, we can parse out and POST the controlGroup, label, state, mimeType, checksumType, and its versionable property.
That's the best we can do. We wouldn't have versioned data (e.g. older versions of datastreams) but it would be pretty close.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Frest-5Fingester_issues_3-23issuecomment-2D335243264&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=vzuQUyRsSO3JGlr5jLz72TuD9TOHA_xPVQ1DtpmrgSg&s=6OIGHDXRfhwWVJ4hFHWagyV8Jv3c9zjkUUlJ38ZX-RU&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIFIwUy3MU9pGccywaOOMmeBIV1ENeppks5sqmLhgaJpZM4PwsZT&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=vzuQUyRsSO3JGlr5jLz72TuD9TOHA_xPVQ1DtpmrgSg&s=6DjTsMOGdC0sFLTiweirkwHhSHlCkMC3Zf4SvkXB1dU&e=.
Another related issue: #7.
@McFateM you can now encode PIDs of existing objects in the object-level directory names. Would be useful if you had a batch of objects (more than one) to replace. See the README for details.
Passing a PID as the value of -n
only works as expected if you are loading only a single object.
As part of work on #9, I have changed this so that you need to omit --namespace in order for the ingester to reuse PIDs from object-level directories. But, you can now ingest multiple objects this way, in other words, if you omit the namespace option, all objects in the input directory will be restored (provided their PIDs are valid).
All required work to get object owner, label, and state from foxml.xml, but work for datastreams remains:
For datastreams, we can parse out and POST the controlGroup, label, state, mimeType, checksumType, and its versionable property.
I have two interests in this module... 1) to migrate objects that have been bagged from one repository to another, and 2) to restore an existing object from some archive (a Bagit store in my case). Both of these options would benefit from, perhaps require, the ability to control the new object's PID.
I haven't studied Fedora's REST API enough to know if/how this can be done so please enlighten me if you already know how this might work in REST.
What I imagine implementing is a modification to the -n (namespace) option with the ability to accept a full PID, like "-n grinnell:12345". In response the code would purge object grinnell:12345 if it exists, and ingest the new object in its place with a PID of grinnell:12345.