NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
41 stars 27 forks source link

Investigate possible causes of broken package version chains and loss of data objects #1586

Open amoeba opened 3 years ago

amoeba commented 3 years ago

Had an odd one come in today from OPC that Rani reported and I looked into:

A user was using the editor UI to update a package with one metadata record and five data objects (https://opc.dataone.org/view/doi%3A10.25494%2FP6FS3P) and ended up with a package with only the metadata that was disconnected from the previous version. The user said they saw no errors during the process. This might be hard to track down but I thought I'd write it up so we might be able to at least look at the code and see if something like a flakey internet connection or an RDF serialization bug might cause the behavior. At the very least, we'll have this ticket in case we see it again.

Many more details follow...

The user went from a package with with the correct relationships:

To an updated version:

Interestingly, the second ORE above wasn't obsoleted by anything. This is a good clue.

The second interesting clue is that the first ORE is obsoleted by a third, intermediate ORE, resource_map_urn:uuid:cbafbb8b-002f-4269-a731-7cc168a3ae62 which showed up in the Tomcat logs with various ORE parser exceptions, see below:

ORE parser errors ``` org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.ore.ResourceMapFactory.parseResourceMap(ResourceMapFactory.java:467) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:160) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.(ForesiteResourceMap.java:140) at org.dataone.cn.indexer.resourcemap.ResourceMapFactory.buildResourceMap(ResourceMapFactory.java:50) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processResourceMap(ResourceMapSubprocessor.java:108) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processDocument(ResourceMapSubprocessor.java:96) at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:236) at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:426) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:644) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:598) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:149) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130) at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88) at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30) at java.lang.Thread.run(Thread.java:748) org.dspace.foresite.OREParserException: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.(ForesiteResourceMap.java:142) at org.dataone.cn.indexer.resourcemap.ResourceMapFactory.buildResourceMap(ResourceMapFactory.java:50) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processResourceMap(ResourceMapSubprocessor.java:108) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processDocument(ResourceMapSubprocessor.java:96) at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:236) at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:426) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:644) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:598) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:149) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130) at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88) at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:164) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.(ForesiteResourceMap.java:140) ... 14 more Caused by: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.ore.ResourceMapFactory.parseResourceMap(ResourceMapFactory.java:467) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:160) ... 15 more metacat-index 20201029-19:05:55: [ERROR]: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') [edu.ucsb.nceas.metacat.index.SolrIndex:process:241] org.dspace.foresite.OREParserException: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.(ForesiteResourceMap.java:142) at org.dataone.cn.indexer.resourcemap.ResourceMapFactory.buildResourceMap(ResourceMapFactory.java:50) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processResourceMap(ResourceMapSubprocessor.java:108) at edu.ucsb.nceas.metacat.index.resourcemap.ResourceMapSubprocessor.processDocument(ResourceMapSubprocessor.java:96) at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:236) at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:426) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:644) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:598) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:149) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130) at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88) at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:164) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.(ForesiteResourceMap.java:140) ... 14 more Caused by: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.ore.ResourceMapFactory.parseResourceMap(ResourceMapFactory.java:467) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:160) ... 15 more metacat-index 20201029-19:05:55: [ERROR]: SolrIndex.update - could not update the solr index for the object resource_map_urn:uuid:cbafbb8b-002f-4269-a731-7cc168a3ae62 since org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') [edu.ucsb.nceas.metacat.index.SolrIndex:update:604] org.apache.solr.client.solrj.SolrServerException: org.dspace.foresite.OREParserException: org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:242) at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:426) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:644) at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:598) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:149) at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148) at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130) at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88) at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30) at java.lang.Thread.run(Thread.java:748) ```

The third piece of evidence is that there are four more versions to the metadata chain than the resource map chain and the last four versions of the metadata are identical.

I fixed everything by linking the two disconnected OREs by mutating each's sysmeta and updating the ORE in the tail position with an ORE that includes the orphaned five data objects.

My best guess at this point is that the user didn't have an internet connectivity error but that MetacatUI somehow issued an invalid ORE that failed to index and caused the editor to get into a state it couldn't recover from. So the editor then created a brand new ORE to house the metadata it was working on which was missing a link to a previous ORE and missing its five data objects.

Here's the ORE that failed to parse:

RDF/XML for the ORE that failed to parse ```xml Diana LaScala-Gruenewald http://orcid.org/0000-0003-1501-0861 resource_map_urn:uuid:ef441988-711e-431c-8f74-750c8c4c40c2 2020-08-31T22:06:16.246Z urn:uuid:87430c14-c0e0-4aaf-b72c-acac096c6f29 urn:uuid:6237da25-1524-4c19-8c74-817bc4d4dbd4 urn:uuid:9b10fed8-4eee-442b-b86c-b98ff18639a3 urn:uuid:b1a41f76-1331-440b-aff7-2e8f6d1c3587 urn:uuid:f24e389f-d29c-4b5f-8b07-2f43d66903b2 urn:uuid:f538efff-74f8-46bb-a42e-77416c673f9c ```
Turtle version ```turtle @prefix rdf: . @prefix n0: . @prefix n1: . @prefix terms: . @prefix cito: . @prefix ter: . _:b0 rdf:type terms:Agent ; n0:name "Diana LaScala-Gruenewald" . n1:creator _:b1 . _:b1 rdf:type terms:Agent ; n0:name "http://orcid.org/0000-0003-1501-0861" . terms:identifier "resource_map_urn:uuid:ef441988-711e-431c-8f74-750c8c4c40c2" ; terms:modified "2020-08-31T22:06:16.246Z" ; rdf:type ter:ResourceMap . terms:identifier "urn:uuid:87430c14-c0e0-4aaf-b72c-acac096c6f29" ; cito:isDocumentedBy ; ter:isAggregatedBy . ter:aggregates ; ter:aggregates ; ter:aggregates ; ter:aggregates ; ter:aggregates . terms:identifier "urn:uuid:6237da25-1524-4c19-8c74-817bc4d4dbd4" ; cito:isDocumentedBy ; ter:isAggregatedBy . terms:identifier "urn:uuid:9b10fed8-4eee-442b-b86c-b98ff18639a3" ; cito:isDocumentedBy ; ter:isAggregatedBy . terms:identifier "urn:uuid:b1a41f76-1331-440b-aff7-2e8f6d1c3587" ; cito:isDocumentedBy ; ter:isAggregatedBy . terms:identifier "urn:uuid:f24e389f-d29c-4b5f-8b07-2f43d66903b2" ; cito:isDocumentedBy ; ter:isAggregatedBy . ter:describes . ter:aggregates ; ter:isDescribedBy . terms:identifier "urn:uuid:f538efff-74f8-46bb-a42e-77416c673f9c" ; cito:documents ; cito:documents ; cito:documents ; cito:documents ; cito:documents ; cito:documents ; cito:isDocumentedBy ; ter:isAggregatedBy . ```

The odd thing about the resource map above is that the PID doesn't match between the ResourceMap resource and the describes triple. I haven't gone so far as to confirm that's the cause of the ORE parser exception but it seems like a good start.

Curious what @laurenwalker and @csjx think about the details here and possible cause.

mbjones commented 3 years ago

The error message on this was similar to the one reported by @vchendrix in #1318

mbjones commented 3 years ago

I checked and the technical reason it doesn't parse is because the resource https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62 lacks a dc:identifier field:

<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62>
        ter:describes  <https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62#aggregation> .

The error for this gets generated on line 466 of ResourceMapFactory when it fails to find the dc:identifier.

I think this is because there are two resources that have been conflated as the ResourceMap:

<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Aef441988-711e-431c-8f74-750c8c4c40c2> a ore:ResourceMap;
  dc11:creator [
    a dc:Agent;
    foaf:name "http://orcid.org/0000-0003-1501-0861"
  ];
  dc:identifier "resource_map_urn:uuid:ef441988-711e-431c-8f74-750c8c4c40c2";
  dc:modified "2020-08-31T22:06:16.246Z" .

<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62> ore:describes <https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62#aggregation> .

Notice how the identifier ending in 40c2 is typed as ResourceMap and has its dc:identifier but lacks a ore:describes for it's associated aggregation (40c2#aggregation, and somehow the bareae62 resource ore:describes another aggregation (ae62#aggregation) and is missing its dcterms:identifier, and it is not typed as a ResourceMap at all. There definitely shouldn't be two aggregations in the same document.

Here's what it should look like from a valid ResourceMap from an Arctic Data Center dataset:

<https://cn.dataone.org/cn/v2/resolve/resource_map_doi%3A10.18739%2FA2VM42Z3G> a ore:ResourceMap;
  dc:creator [
    a dc:Agent;
    foaf:name "DataONE R Client"
  ];
  dc:identifier "resource_map_doi:10.18739/A2VM42Z3G";
  dc:modified "2020-10-27T17:11:09Z"^^xsd:dateTime;
  ore:describes <https://cn.dataone.org/cn/v2/resolve/resource_map_doi%3A10.18739%2FA2VM42Z3G#aggregation> .

I have no idea how it got to be this way, but something went awry in generating that ORE.

ranicrab commented 3 years ago

Hi - I was going to update a file using the UI soon - but I should hold off, correct? There isn't a rush but I would like to update the file - if I need to I can update via R instead.

amoeba commented 3 years ago

Hey @ranicrab, sorry if I didn't make this clear when we were talking the other week. I think you should be good to edit/update the package via the web editor since I created a new resource map for you from scratch. Feel free to ping me on Slack as you do, or if you have any problems. The bug here only causes a small problem and can be fixed like we did before.

amoeba commented 3 years ago

Just talked with @ranicrab on Slack and found out my above is incorrect: Editing the fixed version of the package with the editor broke the package again. While regrettable, this seems like super useful information for reproducing and fixing the issue.

amoeba commented 3 years ago

Talked with Rani today about this and we decided to hold off on fixing the package broken by this until it's fixed so the user doesn't re-break things. After this is fixed, we should reach out to Rani and get the package on OPC fixed that as a metadata PID of urn:uuid:04de7d70-bf82-4372-817a-11cf825b9801. At least one resource map PID in the version chain is resource_map_urn:uuid:d0210dce-6149-4591-8a74-38ad31d951d5 so we'll also want to deal with that.

ranicrab commented 3 years ago

Hi! Just wanted to check in on this since I had a request today to see if we could restore the dataset on OPC.

laurenwalker commented 3 years ago

From Slack:

Hi again @lauren - in case this is helpful, Diana (the person submitting data that has been running into this bug) just tried to add a description to an existing data file (she didn't use the replace option - just was working on the metadata) and it seems to have triggered the same bug. Here is the original version: https://opc.dataone.org/view/doi:10.25494/P6JP4Z and here is the updated version after she tried to modify the data table description: https://opc.dataone.org/view/urn%3Auuid%3A2dbd614c-211d-4ddc-a0e8-3f156b59da8e - I thought she would have been able to modify the metadata only but I guess not..

laurenwalker commented 3 years ago

The bad news is that I have looked into this bug, which does seem to be a duplicate of #1318, and cannot replicate it after two full days of trying. If anyone has any debugging ideas, or @vchendrix and @ranicrab, if you have any further details on when/how these bugs occurred, please let me know.

The good news is that the issue reported by Rani as described in the Dec 10 comment is a different bug that has already been fixed on the develop branch and will be included with MetacatUI 2.14.1. The issue is that the editor was preventing successful saves of resource maps that contained data objects that the user didn't have changePermission permission on (see #1418).

I also fixed a bug related to package saving in commit 73badd5f10b602aa83607bbe6dde00da05c5863d that I believe may have been related to both of these save issues.

@ranicrab - I have fixed Diana's package here: https://opc.dataone.org/view/urn%3Auuid%3A2dbd614c-211d-4ddc-a0e8-3f156b59da8e but she should hold off on further edits until I get MetacatUI updated to 2.14.1 - I will let you know when that is ready.

laurenwalker commented 3 years ago

I am moving this issue to the Icebox pipeline since I have not been able to reproduce it. It won't be forgotten about, but it cannot be slated for a specific release until I can dedicate more time to reproducing or have more details on how it occurred.

robyngit commented 2 years ago

This bug, or one similar to it, seems to have occurred again again for a few users on the ADC.

Case 1

Case 2 & 3

dvirlar2 commented 1 year ago

I've encountered this in a few different datasets on the ADC over the past year. This week, there were two reports of this issue happening on the KNB.

Ben Halpern's Dataset: Ben had asked me to update this version of his dataset by adding two new files, a .tif file and a .csv file. I updated the dataset by uploading files via the web editor, and believe this is where the issue is occurring. This update resulted in the data objects becoming detached from the package, and the package no longer having a working resource map. I've since re-attached the data objects and assigned a working resource map to the dataset.

Jeremy Summer's Dataset: Today, we received an email alerting us that another dataset no longer had the data objects attached, nor did it have a working resource map. The PI seems to have assigned a new DOI, but the issue has persisted. This was likely them trying to troubleshoot, as reassigning a DOI has not been a solution in the past.

More context for Jeremy Summer's Dataset:

Code to Find Version History for Resource Map and Metadata PIDs
```r # Set nodes d1c <- dataone::D1Client("PROD", "urn:node:KNB") # get all versions of the metadata m_vers <- get_all_versions(d1c@mn, "doi:10.5063/F11G0JR2") # go through the list, working from the end, to find the latest version of the dataset # with a working resource map. # get all the resource map pids rm_vers <- get_all_versions(d1c@mn, "resource_map_urn:uuid:e464fd14-350f-41ab-9135-37f40666b79e") ```

I'll fix this second dataset, but providing some extra context and documentation per Matt's request.

robyngit commented 1 year ago

Here is a yet untested theory from @mbjones about what could be causing this issue: this could be an access control problem where people have some but not all of the rights they need to update the package components because of how group management works. To start investigating this, we should scrutinize the ACLs for each object in the package that was being updated. (See #1638 and #1640). To reconstruct the history of changes on packages, look at the metacat event log, where there is a record of which userID performed every CREATE and UPDATE operation on objects.

robyngit commented 1 year ago

Could potentially be related to improper handling of network instability issues? See https://github.com/NCEAS/metacatui/issues/2134

dvirlar2 commented 1 year ago

This bug popped up again on this dataset (most recent version to date): DOI:10.18739/A20Z70Z1H

The ORCID that updated this dataset belongs to Alyssa Mease: https://orcid.org/0000-0003-1870-6856

I had to use a solr query to find out who the orcid belonged to, and I saw an incorrect orcid (belonging to the recent editor) in the output. Here's the code:

df <- query(d1c@mn, list(q='identifier:"urn:uuid:4c940a87-d6fb-477b-b6bb-fd6bdeef73b2"',
                       fl = "rightsHolder, writePermission"),
            as = "data.frame")


And here's the output of df$writePermission: Screenshot 2023-08-07 at 3 19 56 PM

I'm not sure what happened such that the fifth output http://orcid.org/0000-0003-1870-6856fe4 has that fe4 at the end, but it seems to have first appeared on the 54th version of metadata pid: urn:uuid:38b9ea29-67ba-4d52-827d-922cbb8e0168.

Here's the 53rd version of the dataset, and here's the 54th version. From my perspective, nothing looks amiss.

mbjones commented 1 year ago

Just a note: the ORCID ending in 6586 appears to be duplicated in two access rules in the system metadata for version 54 (see https://arcticdata.io/metacat/d1/mn/v2/meta/urn:uuid:38b9ea29-67ba-4d52-827d-922cbb8e0168), one that looks fine, and one that has the stray fe4 appended on the end that @dvirlar2 raised. This is not present in version 53.

robyngit commented 3 weeks ago

From Slack:

We got three datasets today where PIs made updates through the website and the data files were removed. One of them got stuck on the spinning 'Submitting...' button, and one knows that she had an internet connectivity issue while trying to submit. Justin and I haven't been able to recreate these types of problems so we don't really have more details, but it does seems to be happening at an increasing rate It also seems more common for data packages with lots of data files. But maybe that's just because there's a higher likelihood that they'll run into connectivity issues since they take longer to upload