Open amoeba opened 3 years ago
The error message on this was similar to the one reported by @vchendrix in #1318
I checked and the technical reason it doesn't parse is because the resource https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62
lacks a dc:identifier
field:
<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62>
ter:describes <https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62#aggregation> .
The error for this gets generated on line 466 of ResourceMapFactory when it fails to find the dc:identifier
.
I think this is because there are two resources that have been conflated as the ResourceMap
:
<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Aef441988-711e-431c-8f74-750c8c4c40c2> a ore:ResourceMap;
dc11:creator [
a dc:Agent;
foaf:name "http://orcid.org/0000-0003-1501-0861"
];
dc:identifier "resource_map_urn:uuid:ef441988-711e-431c-8f74-750c8c4c40c2";
dc:modified "2020-08-31T22:06:16.246Z" .
<https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62> ore:describes <https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62#aggregation> .
Notice how the identifier ending in 40c2
is typed as ResourceMap
and has its dc:identifier
but lacks a ore:describes
for it's associated aggregation (40c2#aggregation
, and somehow the bareae62
resource ore:describes
another aggregation (ae62#aggregation
) and is missing its dcterms:identifier
, and it is not typed as a ResourceMap at all. There definitely shouldn't be two aggregations in the same document.
Here's what it should look like from a valid ResourceMap from an Arctic Data Center dataset:
<https://cn.dataone.org/cn/v2/resolve/resource_map_doi%3A10.18739%2FA2VM42Z3G> a ore:ResourceMap;
dc:creator [
a dc:Agent;
foaf:name "DataONE R Client"
];
dc:identifier "resource_map_doi:10.18739/A2VM42Z3G";
dc:modified "2020-10-27T17:11:09Z"^^xsd:dateTime;
ore:describes <https://cn.dataone.org/cn/v2/resolve/resource_map_doi%3A10.18739%2FA2VM42Z3G#aggregation> .
I have no idea how it got to be this way, but something went awry in generating that ORE.
Hi - I was going to update a file using the UI soon - but I should hold off, correct? There isn't a rush but I would like to update the file - if I need to I can update via R instead.
Hey @ranicrab, sorry if I didn't make this clear when we were talking the other week. I think you should be good to edit/update the package via the web editor since I created a new resource map for you from scratch. Feel free to ping me on Slack as you do, or if you have any problems. The bug here only causes a small problem and can be fixed like we did before.
Just talked with @ranicrab on Slack and found out my above is incorrect: Editing the fixed version of the package with the editor broke the package again. While regrettable, this seems like super useful information for reproducing and fixing the issue.
Talked with Rani today about this and we decided to hold off on fixing the package broken by this until it's fixed so the user doesn't re-break things. After this is fixed, we should reach out to Rani and get the package on OPC fixed that as a metadata PID of urn:uuid:04de7d70-bf82-4372-817a-11cf825b9801
. At least one resource map PID in the version chain is resource_map_urn:uuid:d0210dce-6149-4591-8a74-38ad31d951d5
so we'll also want to deal with that.
Hi! Just wanted to check in on this since I had a request today to see if we could restore the dataset on OPC.
From Slack:
Hi again @lauren - in case this is helpful, Diana (the person submitting data that has been running into this bug) just tried to add a description to an existing data file (she didn't use the replace option - just was working on the metadata) and it seems to have triggered the same bug. Here is the original version: https://opc.dataone.org/view/doi:10.25494/P6JP4Z and here is the updated version after she tried to modify the data table description: https://opc.dataone.org/view/urn%3Auuid%3A2dbd614c-211d-4ddc-a0e8-3f156b59da8e - I thought she would have been able to modify the metadata only but I guess not..
The bad news is that I have looked into this bug, which does seem to be a duplicate of #1318, and cannot replicate it after two full days of trying. If anyone has any debugging ideas, or @vchendrix and @ranicrab, if you have any further details on when/how these bugs occurred, please let me know.
The good news is that the issue reported by Rani as described in the Dec 10 comment is a different bug that has already been fixed on the develop
branch and will be included with MetacatUI 2.14.1. The issue is that the editor was preventing successful saves of resource maps that contained data objects that the user didn't have changePermission
permission on (see #1418).
I also fixed a bug related to package saving in commit 73badd5f10b602aa83607bbe6dde00da05c5863d that I believe may have been related to both of these save issues.
@ranicrab - I have fixed Diana's package here: https://opc.dataone.org/view/urn%3Auuid%3A2dbd614c-211d-4ddc-a0e8-3f156b59da8e but she should hold off on further edits until I get MetacatUI updated to 2.14.1 - I will let you know when that is ready.
I am moving this issue to the Icebox pipeline since I have not been able to reproduce it. It won't be forgotten about, but it cannot be slated for a specific release until I can dedicate more time to reproducing or have more details on how it occurred.
This bug, or one similar to it, seems to have occurred again again for a few users on the ADC.
I've encountered this in a few different datasets on the ADC over the past year. This week, there were two reports of this issue happening on the KNB.
Ben Halpern's Dataset: Ben had asked me to update this version of his dataset by adding two new files, a .tif file and a .csv file. I updated the dataset by uploading files via the web editor, and believe this is where the issue is occurring. This update resulted in the data objects becoming detached from the package, and the package no longer having a working resource map. I've since re-attached the data objects and assigned a working resource map to the dataset.
Jeremy Summer's Dataset: Today, we received an email alerting us that another dataset no longer had the data objects attached, nor did it have a working resource map. The PI seems to have assigned a new DOI, but the issue has persisted. This was likely them trying to troubleshoot, as reassigning a DOI has not been a solution in the past.
More context for Jeremy Summer's Dataset:
resource_map_urn:uuid:67f604c9-3748-4fd5-a733-65582fb9f340
and resource_map_urn:uuid:7288960c-a9c2-4b0b-ac6d-b6271c2482dd
, respectively. However, they have the same metadata pid: urn:uuid:60291cb3-139e-43e2-ac8a-3182993b2e3f
. Also, when you look at the 13th rm's dataset landing page, the package ID is that of the 14th rm. The 14th rm's landing page looks correct, in terms of the url and package ID lining up.I'll fix this second dataset, but providing some extra context and documentation per Matt's request.
Here is a yet untested theory from @mbjones about what could be causing this issue: this could be an access control problem where people have some but not all of the rights they need to update the package components because of how group management works. To start investigating this, we should scrutinize the ACLs for each object in the package that was being updated. (See #1638 and #1640). To reconstruct the history of changes on packages, look at the metacat event log, where there is a record of which userID performed every CREATE and UPDATE operation on objects.
Could potentially be related to improper handling of network instability issues? See https://github.com/NCEAS/metacatui/issues/2134
This bug popped up again on this dataset (most recent version to date): DOI:10.18739/A20Z70Z1H
The ORCID that updated this dataset belongs to Alyssa Mease: https://orcid.org/0000-0003-1870-6856
I had to use a solr query to find out who the orcid belonged to, and I saw an incorrect orcid (belonging to the recent editor) in the output. Here's the code:
df <- query(d1c@mn, list(q='identifier:"urn:uuid:4c940a87-d6fb-477b-b6bb-fd6bdeef73b2"',
fl = "rightsHolder, writePermission"),
as = "data.frame")
And here's the output of df$writePermission
:
I'm not sure what happened such that the fifth output http://orcid.org/0000-0003-1870-6856fe4
has that fe4
at the end, but it seems to have first appeared on the 54th version of metadata pid: urn:uuid:38b9ea29-67ba-4d52-827d-922cbb8e0168
.
Here's the 53rd version of the dataset, and here's the 54th version. From my perspective, nothing looks amiss.
Just a note: the ORCID ending in 6586
appears to be duplicated in two access rules in the system metadata for version 54 (see https://arcticdata.io/metacat/d1/mn/v2/meta/urn:uuid:38b9ea29-67ba-4d52-827d-922cbb8e0168), one that looks fine, and one that has the stray fe4
appended on the end that @dvirlar2 raised. This is not present in version 53.
From Slack:
We got three datasets today where PIs made updates through the website and the data files were removed. One of them got stuck on the spinning 'Submitting...' button, and one knows that she had an internet connectivity issue while trying to submit. Justin and I haven't been able to recreate these types of problems so we don't really have more details, but it does seems to be happening at an increasing rate It also seems more common for data packages with lots of data files. But maybe that's just because there's a higher likelihood that they'll run into connectivity issues since they take longer to upload
Had an odd one come in today from OPC that Rani reported and I looked into:
A user was using the editor UI to update a package with one metadata record and five data objects (https://opc.dataone.org/view/doi%3A10.25494%2FP6FS3P) and ended up with a package with only the metadata that was disconnected from the previous version. The user said they saw no errors during the process. This might be hard to track down but I thought I'd write it up so we might be able to at least look at the code and see if something like a flakey internet connection or an RDF serialization bug might cause the behavior. At the very least, we'll have this ticket in case we see it again.
Many more details follow...
The user went from a package with with the correct relationships:
urn:uuid:caeb7433-56eb-4951-9725-0bcea98227e6
doi%3A10.25494%2FP6FS3P
To an updated version:
resource_map_urn:uuid:645e12ad-993d-4574-836b-e75c6f7c6160
urn:uuid:1c4bb79e-fbc1-4ae3-94ff-446833e0ab3a
Interestingly, the second ORE above wasn't obsoleted by anything. This is a good clue.
The second interesting clue is that the first ORE is obsoleted by a third, intermediate ORE,
resource_map_urn:uuid:cbafbb8b-002f-4269-a731-7cc168a3ae62
which showed up in the Tomcat logs with various ORE parser exceptions, see below:ORE parser errors
``` org.dspace.foresite.OREException: No Identifer statement was found for the resourceMap resource ('https://cn.dataone.org/cn/v2/resolve/resource_map_urn%3Auuid%3Acbafbb8b-002f-4269-a731-7cc168a3ae62') at org.dataone.ore.ResourceMapFactory.parseResourceMap(ResourceMapFactory.java:467) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap._init(ForesiteResourceMap.java:160) at org.dataone.cn.indexer.resourcemap.ForesiteResourceMap.The third piece of evidence is that there are four more versions to the metadata chain than the resource map chain and the last four versions of the metadata are identical.
I fixed everything by linking the two disconnected OREs by mutating each's sysmeta and updating the ORE in the tail position with an ORE that includes the orphaned five data objects.
My best guess at this point is that the user didn't have an internet connectivity error but that MetacatUI somehow issued an invalid ORE that failed to index and caused the editor to get into a state it couldn't recover from. So the editor then created a brand new ORE to house the metadata it was working on which was missing a link to a previous ORE and missing its five data objects.
Here's the ORE that failed to parse:
RDF/XML for the ORE that failed to parse
```xmlTurtle version
```turtle @prefix rdf:The odd thing about the resource map above is that the PID doesn't match between the
ResourceMap
resource and thedescribes
triple. I haven't gone so far as to confirm that's the cause of the ORE parser exception but it seems like a good start.Curious what @laurenwalker and @csjx think about the details here and possible cause.