NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
28 stars 13 forks source link

Upgrade old version of datacite documents on EZID #2003

Open taojing2002 opened 3 weeks ago

taojing2002 commented 3 weeks ago

We got this email from EZID:

Hello,

You are receiving this message because the EZID team has identified that an account associated with your email address >includes DataCite DOI records registered in a deprecated schema version (<4.x).

As was indicated in our message about adding support for schema v4.5, DOI registrations using schema versions older than >v4.x will no longer be supported by DataCite beginning in January 2025. Please see the DataCite documentation >herehttps://datacite.org/blog/deprecating-schema-3/ for additional details about this change.

Beginning in December of 2024, records in deprecated schema versions will be updated to the most recent version of the >DataCite schema (v4.5) on your behalf to ensure the continued functioning of EZID. The changes are relatively minor, but to >meet the requirements of this schema version, default values will be assigned to two specific fields in both EZID and the >DataCite DOI registration:

resourceType will be set to "(:unav)" resourceTypeGeneral will be set to "Other"

If you do not wish for these updates to be automatically applied, we encourage you to upgrade the schema version of your DOIs >at your earliest convenience. All other account holders associated with these records will be contacted as well. If you have any questions, please reach out and we’ll be happy to provide additional guidance. All the best, Adam Adam Buttrick, Product Manager University of California Curation Center (UC3) California Digital Library University of California Office of the President

Since our current datacite version from Metacat is DataCite 4.3, Matt suggested we can simply trigger a reregistration of all DOIs with older datacite metadata versions. It should work. But we need to figure out which DOIs locate at which server if two servers share the same shoulder.

taojing2002 commented 3 weeks ago

This the list coming from the email. knb-help@nceas.ucsb.edu_dois.csv

rushirajnenuji commented 3 weeks ago

From thread:

Yes, looks like we're currently using DataCite 4.3, and from the CSV attached, looks like we have about ~2200 ADC DOIs, ~40 with prefix 10.25494_p6, and ~200 KNB DOIs. It seems like a lot (all?) of those are v3.x. But I think we should have the required metadata for the latest schema, so updating them sounds good. example:

  1. https://ezid.cdlib.org/manage/display_xml/doi:10.5063/f1gq6vvv
  2. https://ezid.cdlib.org/manage/display_xml/doi:10.5063/f13b5xhm
  3. https://ezid.cdlib.org/manage/display_xml/doi:10.25494/p6qp4h
  4. https://ezid.cdlib.org/manage/display_xml/doi:10.18739/a2fn10t0w
  5. https://ezid.cdlib.org/manage/display_xml/doi:10.18739/a29300
mbjones commented 3 weeks ago

@taojing2002 Note that I think this is a duplicate of issue #1949, and they both should probably close when we fix it.

Would it be worthwhile to bring the Metacat release for DataCite up to 4.5 before we run this whole re-registration? Because DataCite 4.5 is compatible with 4.4, and 4.4 is compatible with 4.3, bringing it up to-date might only require one code line change, to update the schema version header:

diff --git a/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java b/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java
index 0a7e4ab1..f42cd293 100644
--- a/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java
+++ b/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java
@@ -69,7 +69,7 @@ public abstract class DataCiteMetadataFactory {
     public static final String EN = "en";
     public static final String XML_LANG= "xml:lang";
     public static final String NAMESPACE = "http://datacite.org/schema/kernel-4";
-    public static final String SCHEMALOCATION = "https://schema.datacite.org/meta/kernel-4.3/metadata.xsd";
+    public static final String SCHEMALOCATION = "https://schema.datacite.org/meta/kernel-4.5/metadata.xsd";
     public static final String RESOURCE = "resource";
     public static final String CREATORS = "creators";
     public static final String CREATOR = "creator";
taojing2002 commented 3 weeks ago

Yeah, we can upgrade to 4.5 in Metacat 3.1.0. Here is the ticket:

https://github.com/NCEAS/metacat/issues/2005

On Mon, Oct 28, 2024 at 4:56 PM Matt Jones @.***> wrote:

@taojing2002 https://github.com/taojing2002 Note that I think this is a duplicate of issue #1949 https://github.com/NCEAS/metacat/issues/1949, and they both should probably close when we fix it.

Would it be worthwhile to bring the Metacat release for DataCite up to 4.5 before we run this whole re-registration? Because DataCite 4.5 is compatible with 4.4, and 4.4 is compatible with 4.3, bringing it up to-date might only require one code line change, to update the schema version header:

diff --git a/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java b/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java index 0a7e4ab1..f42cd293 100644--- a/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java+++ b/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java@@ -69,7 +69,7 @@ public abstract class DataCiteMetadataFactory { public static final String EN = "en"; public static final String XML_LANG= "xml:lang"; public static final String NAMESPACE = "http://datacite.org/schema/kernel-4";- public static final String SCHEMALOCATION = "https://schema.datacite.org/meta/kernel-4.3/metadata.xsd";+ public static final String SCHEMALOCATION = "https://schema.datacite.org/meta/kernel-4.5/metadata.xsd"; public static final String RESOURCE = "resource"; public static final String CREATORS = "creators"; public static final String CREATOR = "creator";

— Reply to this email directly, view it on GitHub https://github.com/NCEAS/metacat/issues/2003#issuecomment-2442890594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5QQDEC6EZXXSRJUHO2EGDZ53FMPAVCNFSM6AAAAABQYP3QKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSHA4TANJZGQ . You are receiving this because you were mentioned.Message ID: @.***>

taojing2002 commented 1 day ago

After analyzing the redirect url, those dois come from ADC, KNB and OPC. I wrote and ran the script to submit the request to update those DOI. I believe all OPC and ADC ones worked. However, dozens DOIs from KNB are not in our current should list in KNB configuration, so they don't work. I leave those updates to EZID. These are DOIs that we didn't submit since they don't have the system metadata records in our member nodes:

10.18739/a2rp4d 
redirect to https://arcticdata.io/catalog/#view/urn:uuid:1b29fabe-8930-48eb-b48c-dc7f99c2b077
10.5063/f1vm496b
redirect to https://github.com/ropensci/redland-bindings/tree/master/R/redland
10.5063/f1qv3jgm
redirect to  https://github.com/ropensci/datapack
10.5063/f1m61h5x
redirect to  https://github.com/DataONEorg/rdataone
10.5063/f1gf0rf6
redirect to  https://github.com/NCEAS/recordr

I leave those five object to the EZID update as well.

mbjones commented 1 day ago

@taojing2002 thanks!

@rushirajnenuji or @doulikecookiedough could one of you update these software records (and maybe enhance them with ORCIDs/RORs/Funding info where that isn't too burdensome)?

rushirajnenuji commented 1 day ago

Hi @mbjones - yes, will do. Doesn't seem either of those were registered with XML, as no XML object found in EZID, it seems like the default datacite profile was used to generate these objects.

I'll use one of our latest citation objects as a reference, and update each of these with the updated DataCite XML targetting version 4.5

10.5063/f1vm496b redirect to https://github.com/ropensci/redland-bindings/tree/master/R/redland 10.5063/f1qv3jgm redirect to https://github.com/ropensci/datapack 10.5063/f1m61h5x redirect to https://github.com/DataONEorg/rdataone 10.5063/f1gf0rf6 redirect to https://github.com/NCEAS/recordr

doulikecookiedough commented 1 day ago

@mbjones I can help with creating/enhancing the datacite.xml documents for the 4 github repos. Is my assumption correct that the DOI that redirects to ADC dataset does not need updating as it is just a dataset?

@rushirajnenuji Could you please help me update the EZIDs once the new citation documents are ready? Sorry but I don't have access to the EZID online interface or the credentials to push changes via the python client.

doulikecookiedough commented 1 day ago

I've connected with @rushirajnenuji on Slack and will assist with getting this completed.