NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
26 stars 12 forks source link

Updating portal-1.1.0 document doesn't index the `obsoletedBy` field #1525

Closed laurenwalker closed 3 years ago

laurenwalker commented 3 years ago

This is a blocker for NCEAS/metacatui#1631

Steps to reproduce

  1. Create a https://purl.dataone.org/portals-1.0.0 document with the portal editor on test.arcticdata.io
  2. Update the portal with the MetacatUI portal editor in the feature-rule-groups MetacatUI branch, which upgrades the portal object to the new https://purl.dataone.org/portals-1.1.0 formatId. (this could probably also be reproduced via curl).
  3. Note that the obsoletes and obsoletedBy fields in the system metadata of both objects is correct.
  4. Query for the old portal object in Solr, and note that the obsoletedBy Solr field is not populated, even though it is in the system metadata.

The indexing does work for portals that are updated but do not change schema versions (i.e. the old and new version are https://purl.dataone.org/portals-1.0.0).

Example

Here is an example that was created by following the above steps.

Here is the original portal document created by the existing portal editor at test.arcticdata.io:

https://test.arcticdata.io/metacat/d1/mn/v2/meta/urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb

<ns3:systemMetadata>
<serialVersion>1</serialVersion>
<identifier>urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</identifier>
<formatId>https://purl.dataone.org/portals-1.0.0</formatId>
<size>361</size>
<checksum algorithm="MD5">d2739a7fb9b26440eff6112f46e3374f</checksum>
<submitter>http://orcid.org/0000-0003-2192-431X</submitter>
<rightsHolder>http://orcid.org/0000-0003-2192-431X</rightsHolder>
<accessPolicy>
<allow>
<subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
<permission>read</permission>
<permission>write</permission>
<permission>changePermission</permission>
</allow>
</accessPolicy>
<obsoletedBy>urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</obsoletedBy>
<archived>false</archived>
<dateUploaded>2021-08-18T20:49:06.071+00:00</dateUploaded>
<dateSysMetadataModified>2021-08-18T20:50:32.904+00:00</dateSysMetadataModified>
<originMemberNode>urn:node:mnTestARCTIC</originMemberNode>
<authoritativeMemberNode>urn:node:mnTestARCTIC</authoritativeMemberNode>
<seriesId>urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0</seriesId>
<fileName>678765432.xml</fileName>
</ns3:systemMetadata>

Here is the new portal document created by the portal editor in the feature-rule-groups branch:

https://test.arcticdata.io/metacat/d1/mn/v2/meta/urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd

<ns3:systemMetadata>
<serialVersion>0</serialVersion>
<identifier>urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</identifier>
<formatId>https://purl.dataone.org/portals-1.1.0</formatId>
<size>441</size>
<checksum algorithm="MD5">16a4a840ab63dcc0df1505eb7727add9</checksum>
<submitter>http://orcid.org/0000-0003-2192-431X</submitter>
<rightsHolder>http://orcid.org/0000-0003-2192-431X</rightsHolder>
<accessPolicy>
<allow>
<subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
<permission>read</permission>
<permission>write</permission>
<permission>changePermission</permission>
</allow>
</accessPolicy>
<obsoletes>urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</obsoletes>
<archived>false</archived>
<dateUploaded>2021-08-18T20:50:30.782+00:00</dateUploaded>
<dateSysMetadataModified>2021-08-18T20:50:33.099+00:00</dateSysMetadataModified>
<originMemberNode>urn:node:mnTestARCTIC</originMemberNode>
<authoritativeMemberNode>urn:node:mnTestARCTIC</authoritativeMemberNode>
<seriesId>urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0</seriesId>
<fileName>678765432.xml</fileName>
</ns3:systemMetadata>

Here are the Solr documents for both. Notice that the obsoletedBy field is missing:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="q">
seriesId:"urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0"
</str>
<str name="fl">id,seriesId,obsoletes,obsoletedBy,formatId</str>
<str name="fq">
{omitted for brevity}
</str>
<str name="wt">javabin</str>
<str name="version">2</str>
</lst>
</lst>
<result name="response" numFound="2" start="0" numFoundExact="true">
<doc>
<str name="id">urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</str>
<str name="seriesId">urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0</str>
<str name="formatId">https://purl.dataone.org/portals-1.0.0</str>
</doc>
<doc>
<str name="id">urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</str>
<str name="seriesId">urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0</str>
<str name="formatId">https://purl.dataone.org/portals-1.1.0</str>
<str name="obsoletes">urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</str>
</doc>
</result>
</response>
laurenwalker commented 3 years ago

Upon further testing, I am noticing that this happens when you update a portals-1.1.0 document as well. I just updated the portal document in the above description and the obsoletedBy field on the portals-1.1.0 document was not indexed, either.

So this happens when a portal is upgraded from 1.0.0 to 1.1.0 AND when a 1.1.0 document is updated and stays a 1.1.0 document.

Newest portal in the chain:

https://test.arcticdata.io/metacat/d1/mn/v2/meta/urn:uuid:32f3597d-9c12-48a5-928c-de2d688a9d56

<ns3:systemMetadata>
<serialVersion>0</serialVersion>
<identifier>urn:uuid:32f3597d-9c12-48a5-928c-de2d688a9d56</identifier>
<formatId>https://purl.dataone.org/portals-1.1.0</formatId>
<size>452</size>
<checksum algorithm="MD5">029a72e6cfd5dbfe344f127d5a8e048f</checksum>
<submitter>http://orcid.org/0000-0003-2192-431X</submitter>
<rightsHolder>http://orcid.org/0000-0003-2192-431X</rightsHolder>
<accessPolicy>
<allow>
<subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
<permission>read</permission>
<permission>write</permission>
<permission>changePermission</permission>
</allow>
</accessPolicy>
<obsoletes>urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</obsoletes>
<archived>false</archived>
<dateUploaded>2021-08-18T21:12:57.323+00:00</dateUploaded>
<dateSysMetadataModified>2021-08-18T21:12:59.873+00:00</dateSysMetadataModified>
<originMemberNode>urn:node:mnTestARCTIC</originMemberNode>
<authoritativeMemberNode>urn:node:mnTestARCTIC</authoritativeMemberNode>
<seriesId>urn:uuid:ce6625c2-6007-4d74-8227-0a08e7158fe0</seriesId>
<fileName>678765432.xml</fileName>
</ns3:systemMetadata>

Solr documents for the entire chain:

<result name="response" numFound="3" start="0" numFoundExact="true">
<doc>
<str name="id">urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</str>
<str name="formatId">https://purl.dataone.org/portals-1.0.0</str>
<str name="title">New portal creation test</str>
</doc>
<doc>
<str name="id">urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</str>
<str name="formatId">https://purl.dataone.org/portals-1.1.0</str>
<str name="obsoletes">urn:uuid:8ab0065b-59d5-4b66-a0f6-b227839d91fb</str>
<str name="title">New portal creation test</str>
</doc>
<doc>
<str name="id">urn:uuid:32f3597d-9c12-48a5-928c-de2d688a9d56</str>
<str name="formatId">https://purl.dataone.org/portals-1.1.0</str>
<str name="obsoletes">urn:uuid:d2c0420c-da89-409b-baed-46facd1b92cd</str>
<str name="title">New portal creation test - updated</str>
</doc>
</result>
taojing2002 commented 3 years ago

I can reproduce the issue by using metadata editor to generate EML objects on test.arcticdata.io. However, I can't reproduce it on dev.nceas.ucsb.edu. I need to do more investigation to see if the issue is specific for the test.arcticdata.io instance.

taojing2002 commented 3 years ago

I used curl commands to minimize the difference when I created and updated objects on both test.arcticdata.io and dev.nceas.ucsb.edu. The result shows it is a specific issue on test.arcticdata.io when Metacat tries to reindex an obsoleted object in the update process. This error only shows on test.arcticdata.io:

metacat-index 20210820-10:55:06: [ERROR]: SolrIndex.insetFields - could not update the solr index for the object jtao.164.1 since undefined field: "indexeddate" [edu.ucsb.nceas.metacat.index.SolrIndex:insertFields:521]
org.apache.solr.common.SolrException: undefined field: "indexeddate"
    at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1247) ~[solr-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:37:16]
    at edu.ucsb.nceas.metacat.index.SolrIndex.insertFields(SolrIndex.java:477) [classes/:?]
    at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:157) [classes/:?]
    at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:120) [classes/:?]
    at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148) [hazelcast-client-2.4.1.jar:2.4.1]
    at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130) [hazelcast-client-2.4.1.jar:2.4.1]
    at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88) [hazelcast-client-2.4.1.jar:2.4.1]
    at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30) [hazelcast-client-2.4.1.jar:2.4.1]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]

Somehow a new solr field indexeddate was generated.

laurenwalker commented 3 years ago

Thanks for looking into this Jing. So does anything need to be changed in Metacat to fix this from happening again? Or is this just a case of the cached schema file needing to be refreshed?

taojing2002 commented 3 years ago

Add a new ticket for the cached schema file to address this issue. https://github.com/NCEAS/metacat/issues/1529