Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Can't reindex a Node after it is deleted from Fedora #1213

Open seth-shaw-unlv opened 5 years ago

seth-shaw-unlv commented 5 years ago

It appears that once a node has been deleted from Fedora it cannot be added back:

  1. Create an islandora_object node
  2. Ensure it is in Fedora
  3. From the Drupal 8 Content Page, select the node and run the 'Delete Node from Fedora' action (do the triplestore one too if you want).
  4. Verify the node is no longer in Fedora (you will see a tombstone page).
  5. From the Drupal 8 Content Page, select the node and run the 'Index in Fedora' action (might as well run the triplestore one too, while you are at it).
  6. See that the Node's Fedora URI pseudo-field hasn't returned (as it isn't listed in Gemini), the tombstone is still there, and there are errors in '/opt/karaf/data/log/camel.log':
    2019-07-10 19:47:27,192 | DEBUG | -fcrepo-content] | ObjectHelper                     | 57 - org.apache.camel.camel-core - 2.20.4 | Cannot find class: urnuuid
    2019-07-10 19:47:27,192 | DEBUG | -fcrepo-content] | SendDynamicProcessor             | 57 - org.apache.camel.camel-core - 2.20.4 | >>>> http://localhost:8000/milliner/node/7b383222-a616-4e12-af41-d641db74e428?connectionClose=true Exchange[ID-claw-1561579691424-7-313]
    2019-07-10 19:47:27,193 | DEBUG | -fcrepo-content] | HttpProducer                     | 110 - org.apache.camel.camel-http4 - 2.20.4 | Executing http POST method: http://localhost:8000/milliner/node/7b383222-a616-4e12-af41-d641db74e428
    2019-07-10 19:47:27,322 | DEBUG | -fcrepo-content] | HttpProducer                     | 110 - org.apache.camel.camel-http4 - 2.20.4 | Http responseCode: 410
    2019-07-10 19:47:27,323 | DEBUG | -fcrepo-content] | DefaultErrorHandler              | 57 - org.apache.camel.camel-core - 2.20.4 | Failed delivery for (MessageId: queue_islandora-indexing-fcrepo-content_ID_claw-44635-1561579238412-3_21_-1_1_1 on ExchangeId: ID-claw-1561579691424-7-313). On delivery attempt: 0 caught: org.apache.camel.http.common.HttpOperationFailedException: HTTP operation failed invoking http://localhost:8000/milliner/node/7b383222-a616-4e12-af41-d641db74e428 with statusCode: 410

So, it looks like because we have a tombstone for a resource, once it is deleted we can't put it back. This raises a few questions in my mind:

  1. Can we tell Fedora "no really, I have the stuff that goes here and I want to put it back"? Which leads to "how"?
  2. If not, re-indexing the same node with a different URI would require changing the node's UUID; can we do that?
  3. Failing that, how can we tell people that they really don't want to delete something from Fedora if there is any chance they will want to put it back (unless you want to copy the node's data to a new node)? It doesn't seem like relying on documentation would be sufficient for this; although perhaps it is enough of an edge-case that it is.
DiegoPino commented 5 years ago

1.- You need to kill the tombstone with an additional HTTP request (see the API docs), but then, what is the purpose of a tombstone if its going to be removed? 2.- Yes you can, UUID is user(code) updatable, but then, if anything else is pointing/referring to that UUID (as it will eventually be) you need to change if Drupal wide. 3.- To put it back you need to delete the tombstone anyway

seth-shaw-unlv commented 5 years ago

what is the purpose of a tombstone if its going to be removed?

Usually, yeah, when we kill something we intend for it to be gone for good. However, there are occasionally those "my bad" moments where "I didn't mean to do that" and you want to put it back.

This is certainly an edge-case and not a day-to-day workflow thing, but my guess is someone will eventually accidentally delete something (the Delete from Fedora action is second on the action dropdown list on the Content management page) and need it put back into Fedora.

dannylamb commented 5 years ago

Could've sworn we were deleting tombstones. This has come up before. I'll have to take a deeper look to see what's really happening vs. my expectations.

On Wed, Jul 10, 2019, 17:39 Seth Shaw notifications@github.com wrote:

what is the purpose of a tombstone if its going to be removed?

Usually, yeah, when we kill something we intend for it to be gone for good. However, there are occasionally those "my bad" moments where "I didn't mean to do that" and you want to put it back.

This is certainly an edge-case and not a day-to-day workflow thing, but my guess is someone will eventually accidentally delete something (the Delete from Fedora action is second on the action dropdown list on the Content management page) and need it put back into Fedora.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/1213?email_source=notifications&email_token=AE6PSH7Z7WLGWYCDPL4XU7DP6ZCGXA5CNFSM4H7UFNAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUVL6I#issuecomment-510219769, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6PSH3LH2O72NXOOEELSYTP6ZCGXANCNFSM4H7UFNAA .

wrandtkeflvc commented 5 years ago

So, I may be way off here, if tombstones is a Fedora term of art rather than the generic term.

A use case where you would want a tombstone for a while, but not forever, is when working with a harvester where the harvester uses tombstones to remove records from the last harvest. The OAI-PMH standard version 2 (the current version) was released in 2002, when downloading a video clip was an all day wait. When bandwidth was scarcer, maybe more harvesters worked like this - they did not reharvest all records on a schedule, because too much time to do a full reharvest. Instead, they got just the newly posted records or updated records with datestamp later than last harvest, and got all the tombstones with datestamp later than last harvest to purge older records. Nowadays, just about no one would do this. Bandwidth is better, so harvesters would reharvest all collections from scratch.

But, there's always the chance. And, in 2019, Encore is like that - the OAI-PMH harvester in Encore updates by going in on a schedule and doing OAI-PMH for all items with datestamp more recent than last harvest, then by doing OAI-PMH for tombstones and purging.

It's very rare for a harvester to work that way, and we were surprised.

Tombstones are optional in the OAI-PMH.

In terms of community expectations,

There's the technical reason of undoing deletes. For that, you would keep metadata and files for a while, then purge so as not to have a bunch of stuff hanging out forever. Anecdotally, here we do sometimes have accidental deletions in Islandora 7. We have a coop server where content is periodically synced and do nightly backups. So far, accidental deletions are rare, and usually are a whole collection of items at a time and reported same week they occur. When these happen, getting the items back is a manual process of getting the content from coop or a backup, and is kind of clunky. Anyway, with many users (something like 20 Islandora sites each with multiple user accounts, although some are staffed intermittently) able to delete materials, accidental deletions happen a couple of times a year. Restoring accidental deletions is something to be nervous about, but probably not a huge need. If it can be done from backup, then that's probably good enough and instructions for how to do this from the backend are good enough.

There's the community reason of showing that an item used to be there for supporting reliable citations. For example, maybe an article is posted, then cited, the removed, then someone wants to check the cite. The tombstone allows the person to confirm that the article was there. For that, the tombstone is metadata only - just enough for a citation, and a reason why the item was deleted. This is probably what librarians want.

It seems like removing the tombstone in order to reinstate the item is fine. The link and citation would work, which would meet community expectations. You might want to check that whatever is going out with OAI-PMH updates the datestamp to the date the item was reinstated. As far as I know, datestamp for OAIPMH is just to let harvesters do selective harvests, and it not descriptive metadata - so updating won't make the item appear newer or anything. It should already have a publication date in the descriptive metadata which is unrelated to datestamp.

whikloj commented 5 years ago

Doesn't look like we are deleting tombstones, I feel like we might have thought this should be optional/configurable.

The only reason you should re-use a URI is for the same content, but then why would we allow you to DELETE and then PUT. Seems better to just PUT the updated content over-top.

That being said, Milliner does the delete here using Chullo here (for RDFSource objects).

We can add a loop to Milliner to also remove the tombstone, but it seems like a question of what the use case is for manually deleting the resources and then re-creating them again.