CDLUC3 / ezid

CDLUC3 ezid
MIT License
10 stars 4 forks source link

Upgrade from DataCite schema 4.4 to 4.5 #511

Closed mariagould closed 4 months ago

mariagould commented 1 year ago

When DataCite's schema 4.5 is public, we need to upgrade EZID to support DOIs registered with this version of the schema. Ideally this will be done as soon as possible following the public release of the new version, so that users can take advantage of the new metadata fields available, and so that EZID does not fall behind in its support for DataCite DOIs.

Requirements:

Changes in version 4.5 that need to be reflected in EZID are as follows:

Full schema documentation:

Notes from DataCite:

Key milestones

Tests

Test documentation

Deployment plan

jsjiang commented 9 months ago

Need updates to the following static pages based on Schema 4.5

Note:

jsjiang commented 9 months ago
jsjiang commented 8 months ago

Records by schema versions (https://doi.datacite.org/providers/cdlco/dois):

Test cases:

jsjiang commented 8 months ago

Note: you can get records by schema using DataCite API:

https://api.datacite.org/dois?client-id=CDLCO&schema-version=3

jsjiang commented 8 months ago

Sample records:

jsjiang commented 8 months ago

Test API create records:

jsjiang commented 8 months ago

Note: the default DataCite schema is set in the formElementsToDataciteXml function as:

   namespace = "http://datacite.org/schema/kernel-4"
   schemaLocation = "http://schema.datacite.org/meta/kernel-4/metadata.xsd"
jsjiang commented 8 months ago

Test updating older schema records:

jsjiang commented 8 months ago

Regarding:

Questions:

Convert contributor type "funder" to Schema 4 compatible data element:

From:

<root xmlns:N="http://example.com/ns">
    <contributors>
        <contributor contributorType="Funder">
            <contributorName>John Doe Foundation</contributorName>
        </contributor>
        <contributor contributorType="Other">
            <contributorName>Jane Smith</contributorName>
        </contributor>
    </contributors>
</root>

To:

<root xmlns:N="http://example.com/ns">
    <contributors>
        <contributor contributorType="Other">
            <contributorName>Jane Smith</contributorName>
        </contributor>
    </contributors>
    <fundingReferences>
        <fundingReference>
            <funderName>John Doe Foundation</funderName>
        </fundingReference>
    </fundingReferences>
</root>
jsjiang commented 8 months ago

Current:

Updating:

TO-DO:

Sample schema 3 records on ezid-stg:

jsjiang commented 8 months ago

New workflow(draft) Creating new ID:

Updating:

Question:

New workflow until Jan 2025 Creating new ID:

Updating:

Workflow after Jan 2025 Creating new ID:

Updating:

jsjiang commented 8 months ago

Regarding the newly added resourceTypeGeneral values:

  1. the resourceTypeGeneral is a required data element and validation is required
  2. EZID performs:
    • Form/UI field value validation
    • data model validations (resourceTypes in models/validation.py)
jsjiang commented 7 months ago

Batch upgrade older version records to version 4.x:

Image

Image

jsjiang commented 7 months ago

Test batch register tool:

CDL-jjiang-9m:ezid-client-tools jjiang$ python batch-register3_stg.py -c admin:pwd -s doi:10.5062/F4 mint mapping.cfg input_datacite_4.csv 1,doi:10.5062/F42R3QTW, 2,doi:10.5062/F4Z0379V, 3,doi:10.5062/F4T72GK4, 4,doi:10.5062/F4PG1QVD,

mapping.cfg

_profile = datacite
/resource/titles/title = $1
/resource/creators/creator/creatorName = $2
/resource/creators/creator/nameIdentifier = $3
/resource/creators/creator/nameIdentifier@nameIdentifierScheme = $4
/resource/publisher = $5
/resource/publisher@publisherIdentifier = $6
/resource/publisher@publisherIdentifierScheme = $7
/resource/publisher@schemeURI = $8
/resource/publicationYear = $9
/resource/resourceType = $10
/resource/resourceType@resourceTypeGeneral = $11
_target = $12

input_datacite_4.csv

test title 1,test creator name 1,https://orcid.org/0000-0003-1660-3511,ORCID,test publisher 1,https://ror.org/03yrm5c26,ROR,https://ror.org/,2020,pre-print,Book,https://google.com/
test title 2,test creator name 2,https://orcid.org/0000-0002-9315-0678,ORCID,test publisher 2,https://ror.org/03yrm5c26,ROR,,2021,research data,Dataset,https://google.com/
test title 3,test creator name 3,https://orcid.org/0000-0002-9315-0678,ORCID,test publisher 3,https://ror.org/01an7q238,ROR,https://ror.org/,2022,Violin,Instrument,https://google.com/
test title 4,test creator name 4,https://orcid.org/0000-0002-4216-1107,ORCID,test publisher 4,https://ror.org/01an7q238,ROR,,2023,registration,StudyRegistration,https://google.com/
jsjiang commented 7 months ago

Noticed a problem while testing the batch register script: the newly added publisher identifier related sub-properties are saved as attributes in our system as expected but they are not showing up in DataCite testing system. This affects minting, creating and updating version 4.5 records containing the publisher identifier related sub-properties.

Filed a ticket with DataCite support:

Subject: Testing DataCite Schema 4.5 in the test environment
to: support@datacite.org <support@datacite.org>
Sent: 4/15, 3:59PM

I ran into problems while testing the DataCite Schema 4.5 upgrade in the EZID/DataCite test environment. I created a few DOIs with the three new sub-properties to the publisher property:
publisherIdentifier
publisherIdentifierScheme
schemeUri

These sub-properties are saved as attributes to the publisher element in the XML document in our system. However, these sub-properties are not showing up in the DataCite testing system.

Here are a few sample DOIs:
https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5062/F42R3QTW
with:
<publisher publisherIdentifier=https://ror.org/03yrm5c26 publisherIdentifierScheme="ROR" schemeURI=https://ror.org/>California Digital Library</publisher>

However, the record created in the DataCite test system does not contain the publisherIdentifier related data elements:

https://api.test.datacite.org/dois/10.5062/F42R3QTW

      "publisher": "California Digital Library",
      "container": {

      },

Another example:
https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5062/F4PG1QVD

https://api.test.datacite.org/dois/10.5062/F4PG1QVD

Can you please help troubleshoot this issue?
jsjiang commented 6 months ago

A note on the "resourcetypeGeneral" support to the datacite profile using the datacite.fieldname format:

Sample record:

_profile: datacite
_target: https://google.com
datacite.creator: test creator
datacite.title: test datacite doi
datacite.publisher: ACM
datacite.publicationyear: 2023
datacite.resourcetype: Book/Large Print

Resolved:

From: Xiaoli Chen <support@datacite.org>
Date: Tuesday, April 16, 2024 at 2:06 AM
To: Jing Jiang <Jing.Jiang@ucop.edu>
Subject: Re: Testing DataCite Schema 4.5 in the test environment
CAUTION: EXTERNAL EMAIL
Hi Jing, 

Thanks for the email. 

An additional URL parameter &publisher=true is required to display the publisher information, [here's a bit more information](https://support.datacite.org/docs/can-i-see-more-detailed-affiliation-information-in-the-rest-api#publisher-identifiers).

This query should display the full publisher information:
https://api.test.datacite.org/dois/10.5062/F4PG1QVD?publisher=true 

      "publisher": {
        "name": "California Digital Library",
        "schemeUri": "https://ror.org/",
        "publisherIdentifier": "https://ror.org/01an7q238",
        "publisherIdentifierScheme": "ROR"
      },

Hope this helps, let me know if we can support further!

Best regards,
Xiaoli
jsjiang commented 6 months ago

May 30: Merged the develop branch into merge_bdb_and_dc45 to include Django 4.2.11 upgrade:

CDL-jjiang-9m:ezid jjiang$ git merge develop
Auto-merging impl/form_objects.py
Merge made by the 'ort' strategy.
 ansible/templates/etc/httpd/conf.d/03-ezid-nossl.conf.j2               |   7 -
 ezidapp/migrations/0001_squashed_0005_rename_index.py                  | 438 ++++++++++++++++++++++++++++++++
 ezidapp/migrations/{ => squashed_migrations}/0001_initial.py           |   0
 .../migrations/{ => squashed_migrations}/0002_auto_20221026_1139.py    |   0
 .../migrations/{ => squashed_migrations}/0003_auto_20230809_1154.py    |   0
 ezidapp/migrations/{ => squashed_migrations}/0004_minter.py            |   0
 ezidapp/migrations/squashed_migrations/0005_rename_index.py            | 158 ++++++++++++
 ezidapp/models/identifier.py                                           |  59 ++---
 ezidapp/models/link_checker.py                                         |   2 +-
 impl/form_objects.py                                                   |   2 +-
 impl/ui.py                                                             |   2 +-
 impl/ui_account.py                                                     |   4 +-
 impl/ui_admin.py                                                       |   4 +-
 impl/ui_common.py                                                      |   4 +-
 impl/ui_create.py                                                      |   2 +-
 impl/ui_manage.py                                                      |   2 +-
 impl/ui_search.py                                                      |   2 +-
 requirements-dev.txt                                                   |   2 +-
 requirements.txt                                                       |   2 +-
 settings/settings.py.j2                                                |   7 +
 ui_tags/templatetags/manage_form_tags.py                               |   2 +-
 ui_tags/templatetags/menus.py                                          |   2 +-
 22 files changed, 647 insertions(+), 54 deletions(-)
 create mode 100644 ezidapp/migrations/0001_squashed_0005_rename_index.py
 rename ezidapp/migrations/{ => squashed_migrations}/0001_initial.py (100%)
 rename ezidapp/migrations/{ => squashed_migrations}/0002_auto_20221026_1139.py (100%)
 rename ezidapp/migrations/{ => squashed_migrations}/0003_auto_20230809_1154.py (100%)
 rename ezidapp/migrations/{ => squashed_migrations}/0004_minter.py (100%)
 create mode 100644 ezidapp/migrations/squashed_migrations/0005_rename_index.py
adambuttrick commented 5 months ago

Quick start guides updated in the attached. If someone could please do a double check, I would appreciate!

EZID_AdvancedCreate.pdf EZID_ResourceTypes.pdf EZID_RelationTypes.pdf

jsjiang commented 5 months ago

@adambuttrick Thank you Adam for updating the PDFs. For the advanced create guide:

  1. Publisher: we may want to add a note indicating the sub-properties are optional
  2. ResourceType: We need to add mandatory attribute ResourceTypeGeneral and indicate ResourceType is free text, ResourceTypeGeneral is controlled vocabulary.

The other two files look good to me.

Jing

adambuttrick commented 5 months ago

@jsjiang Revised in the attached. Please let me know if this reflects desired changes.

EZID_AdvancedCreate_r1.pdf

jsjiang commented 5 months ago

@adambuttrick Hi Adam, How about replace the original ResourceType property line (5th from the top) with the newly added ResourceType (with subproperty resourceTypeGeneral). An asterisk (*) is needed as it is a mandatory property.

Jing

jsjiang commented 5 months ago

June 10: Merged the develop branch to the merge_bdb_and_dc45 branch to include Poetry implementation:

(ezid-py38) CDL-jjiang-9m:ezid jjiang$ git config pull.rebase false
(ezid-py38) CDL-jjiang-9m:ezid jjiang$ git pull origin develop
From https://github.com/CDLUC3/ezid
 * branch              develop    -> FETCH_HEAD
Merge made by the 'ort' strategy.
 .github/workflows/main.yml                   |   25 +-
 README.2.7.md                                |  141 +++++++++++
 README.md                                    |  313 ++++++++++++++---------
 ansible/group_vars/all                       |    1 +
 ansible/notes/notes.migrating_to_poetry      |  237 +++++++++++++++++
 ansible/roles/ezid/tasks/configure_ezid.yaml |   36 +--
 poetry.lock                                  | 1424 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pyproject.toml                               |   83 ++++++
 requirements-dev.txt                         |   37 +--
 requirements.txt                             |   31 ---
 setup.py => setup.py.bk                      |    0
 ui_tags/templatetags/decorators.py           |    4 +-
 update_pyproject.sh                          |   37 +++
 13 files changed, 2158 insertions(+), 211 deletions(-)
 create mode 100644 README.2.7.md
 create mode 100644 ansible/notes/notes.migrating_to_poetry
 create mode 100644 poetry.lock
 delete mode 100644 requirements.txt
 rename setup.py => setup.py.bk (100%)
 create mode 100755 update_pyproject.sh
adambuttrick commented 5 months ago

Final revision to advanced create: EZID_AdvancedCreate.pdf

jsjiang commented 5 months ago

CDL accounts with DataCite DOIs:

select user.id as user_id, user.username, user.displayName, shoulder.prefix as shoulder_prefix, shoulder.name as shoulder_name from ezidapp_user user
left join ezidapp_user_shoulders user_shoulders
on user.id = user_shoulders.user_id
left join `ezidapp_shoulder` shoulder
on shoulder.id = user_shoulders.shoulder_id
where shoulder.type = 'DOI' and shoulder.crossrefEnabled = 0
and (user.displayName like 'CDL%' or user.username='dmptool')
order by user.id, shoulder.prefix;

user_id username    displayName shoulder_prefix shoulder_name
65  eschol_harvester    CDL eScholarship    doi:10.15779/J2 Berkeley Law Library
65  eschol_harvester    CDL eScholarship    doi:10.15779/Z38    Berkeley Law School Journals
65  eschol_harvester    CDL eScholarship    doi:10.20353/K3 UC Observatories
65  eschol_harvester    CDL eScholarship    doi:10.21418/G8 UCB Geotechnical Engineering Research
65  eschol_harvester    CDL eScholarship    doi:10.21980/J8 UCI JETem
65  eschol_harvester    CDL eScholarship    doi:10.23733/M3 UCLA Music Library
65  eschol_harvester    CDL eScholarship    doi:10.34940/E2 CDL eScholarship (Supplemental Material)
65  eschol_harvester    CDL eScholarship    doi:10.34950/E2 UC Berkeley eScholarship (Supplemental Material)
65  eschol_harvester    CDL eScholarship    doi:10.34951/E2 UC Davis eScholarship (General)
65  eschol_harvester    CDL eScholarship    doi:10.48453/S3 Ultrasound in Resource-Limited Settings
65  eschol_harvester    CDL eScholarship    doi:10.6074/D4  eschol DataCite
65  eschol_harvester    CDL eScholarship    doi:10.7268/P1  UC Riverside Bourns College of Engineering
65  eschol_harvester    CDL eScholarship    doi:10.7286/    Biocode Commons (UCB only)
124 merritt CDL UC3 Merritt doi:10.17916/P6 UC Press
124 merritt CDL UC3 Merritt doi:10.18736/D6 UCOP Dryad
124 merritt CDL UC3 Merritt doi:10.25338/B8 UC Davis Bio Agr Eng Dash
124 merritt CDL UC3 Merritt doi:10.25349/D9 UCSB Dash
124 merritt CDL UC3 Merritt doi:10.5068/D1  UCLA Dash
124 merritt CDL UC3 Merritt doi:10.6071/M3  UCM Dash
124 merritt CDL UC3 Merritt doi:10.6071/Z7  UCM SSCZO
124 merritt CDL UC3 Merritt doi:10.6075/J0  UCSD
124 merritt CDL UC3 Merritt doi:10.6076/D1  UCSD Dryad
124 merritt CDL UC3 Merritt doi:10.6078/D1  UCB Dash
124 merritt CDL UC3 Merritt doi:10.6086/D1  UC Riverside DASH
124 merritt CDL UC3 Merritt doi:10.7268/P1  UC Riverside Bourns College of Engineering
124 merritt CDL UC3 Merritt doi:10.7272/Q6  UCSF Clinical & Translational Science Institute (CTSI)
124 merritt CDL UC3 Merritt doi:10.7280/D1  UCI Dash
124 merritt CDL UC3 Merritt doi:10.7291/D1  UCSC Dash
124 merritt CDL UC3 Merritt doi:10.7297/X2  UC Berkeley Department of Linguistics
124 merritt CDL UC3 Merritt doi:10.7941/D1  LBNL Dash
318 dash    CDL UC3 Dash    doi:10.17916/P6 UC Press
318 dash    CDL UC3 Dash    doi:10.18736/D6 UCOP Dryad
318 dash    CDL UC3 Dash    doi:10.25338/B8 UC Davis Bio Agr Eng Dash
318 dash    CDL UC3 Dash    doi:10.25349/D9 UCSB Dash
318 dash    CDL UC3 Dash    doi:10.5068/D1  UCLA Dash
318 dash    CDL UC3 Dash    doi:10.6071/M3  UCM Dash
318 dash    CDL UC3 Dash    doi:10.6075/J0  UCSD
318 dash    CDL UC3 Dash    doi:10.6076/D1  UCSD Dryad
318 dash    CDL UC3 Dash    doi:10.6078/D1  UCB Dash
318 dash    CDL UC3 Dash    doi:10.6086/D1  UC Riverside DASH
318 dash    CDL UC3 Dash    doi:10.7272/Q6  UCSF Clinical & Translational Science Institute (CTSI)
318 dash    CDL UC3 Dash    doi:10.7280/D1  UCI Dash
318 dash    CDL UC3 Dash    doi:10.7291/D1  UCSC Dash
318 dash    CDL UC3 Dash    doi:10.7941/D1  LBNL Dash
468 dmptool DMPTool doi:10.48321/D1 CDL DMPTool
jsjiang commented 5 months ago

Updated Quick Start Guides (3 PDFs, commit bea2719f4910787fb5d359f4ed1e4c892c4e2f1b)

jsjiang commented 5 months ago

Testing log Version 2.2 records with resourceTyepeGeneral:

Version 2.2 records without resourceTyepeGeneral: updated these records using the batch-register3.py script

Script output: 1,doi:10.5062/F44M92G2, 2,doi:10.5062/F40V89RB, 3,doi:10.5060/D4RN35SD/ZMUTT1894081101, 4,doi:10.5060/D4RN35SD/AGS_M_58_S63,

doi:10.5062/F40V89RB metadata on EZID:

{
    "datacite.title": "Using ISI Web of Science to Compare Top-Ranked Journals to the Citation Habits of a \"Real World\" Academic Department",
    "datacite.creator": "Jeremy Cusker",
    "datacite.publisher": "Issues in Science and Technology Librarianship",
    "datacite.publicationyear": "2012"
}
adambuttrick commented 5 months ago

DMPTool tested DOI registration on stage:

https://dmphub.uc3stg.cdlib.net/dmps/10.48321/D156A867AF

Updated to v4 record with resourceTypeGeneral="OutputManagementPlan" and resourceType=Data Management Plan

<resource xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.48321/D156A867AF</identifier>
<creators>
<creator>
<creatorName nameType="Personal">Maria Praetzellis</creatorName>
<nameIdentifier schemeURI="https://orcid.org/" nameIdentifierScheme="ORCID">
https://orcid.org/0000-0001-5047-3090
</nameIdentifier>
<affiliation affiliationIdentifier="https://ror.org/01an7q238" affiliationIdentifierScheme="ROR">
University of California, Berkeley
</affiliation>
</creator>
</creators>
<titles>
<title xml:lang="en-US">testing</title>
</titles>
<publisher xml:lang="en-US">DMPTool</publisher>
<publicationYear>2024</publicationYear>
<language>en</language>
<resourceType resourceTypeGeneral="OutputManagementPlan">Data Management Plan</resourceType>
<descriptions>
<description xml:lang="en" descriptionType="Abstract"> </description>
</descriptions>
<contributors>
<contributor contributorType="Producer">
<contributorName nameType="Organizational">University of California, Berkeley</contributorName>
<nameIdentifier schemeURI="https://ror.org/" nameIdentifierScheme="ROR"> https://ror.org/01an7q238 </nameIdentifier>
</contributor>
</contributors>
<fundingReferences>
<fundingReference>
<funderName>National Institutes of Health (nih.gov)</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/01cwqze88</funderIdentifier>
</fundingReference>
</fundingReferences>
</resource>
adambuttrick commented 5 months ago

Testing with below:

import argparse
import requests

def fetch_datacite_metadata(doi, test):
    if test:
        url = f"https://api.test.datacite.org/dois/{doi}"
    else:
        url = f"https://api.datacite.org/dois/{doi}"
    headers = {"Accept": "application/vnd.datacite.datacite+xml"}

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        return response.text
    else:
        raise Exception(f"Error: Unable to retrieve record (Status code: {response.status_code})")

def save_xml_to_file(doi, xml_content):
    filename = f"{doi.replace('/', '_')}.xml"
    with open(filename, 'w', encoding='utf-8') as file:
        file.write(xml_content)
    return filename

def main():
    parser = argparse.ArgumentParser(description="Fetch DataCite metadata in XML format.")
    parser.add_argument("-d", "--doi", required=True, help="DOI of the record to retrieve.")
    parser.add_argument("-t", "--test", action='store_true', help="Use DataCite test vs. prod")

    args = parser.parse_args()

    try:
        xml_content = fetch_datacite_metadata(args.doi, args.test)
        filename = save_xml_to_file(args.doi, xml_content)
    except Exception as e:
        print(e)

if __name__ == "__main__":
    main()

10.5062/f44m92g2 Registered - https://api.test.datacite.org/dois/10.5062/f44m92g2 Updated to v4 record with resourceTypeGeneral="Other" and resourceType=(:unav)

<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4     http://schema.datacite.org/meta/kernel-4/metadata.xsd">
  <identifier identifierType="DOI">10.5062/F44M92G2</identifier>
  <creators>
    <creator>
      <creatorName>Dianne Dietrich et al</creatorName>
    </creator>
  </creators>
  <titles>
    <title>De-Mystifying the Data Management Requirements of Research Funders</title>
  </titles>
  <publisher>Issues in Science and Technology Librarianship</publisher>
  <publicationYear>2012</publicationYear>
  <resourceType resourceTypeGeneral="Other">(:unav)</resourceType>
</resource>

10.5062/f40v89rb Registered - https://api.test.datacite.org/dois/10.5062/f40v89rb Updated to v4 record with resourceTypeGeneral="Other" and resourceType=(:unav)

<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4     http://schema.datacite.org/meta/kernel-4/metadata.xsd">
  <identifier identifierType="DOI">10.5062/F40V89RB</identifier>
  <creators>
    <creator>
      <creatorName>Jeremy Cusker</creatorName>
    </creator>
  </creators>
  <titles>
    <title>Using ISI Web of Science to Compare Top-Ranked Journals to the Citation Habits of a "Real World" Academic Department</title>
  </titles>
  <publisher>Issues in Science and Technology Librarianship</publisher>
  <publicationYear>2012</publicationYear>
  <resourceType resourceTypeGeneral="Other">(:unav)</resourceType>
</resource>
jsjiang commented 5 months ago

Testing log for updating version 3 records:

adambuttrick commented 5 months ago

Create DOI 10.5072/FK29K4GW2H via UI, incorporating all Schema 4.5 changes.

Created successfully:

<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
    <identifier identifierType="DOI">10.5072/FK29K4GW2H</identifier>
    <creators>
        <creator>
            <creatorName>EZID</creatorName>
        </creator>
    </creators>
    <titles>
        <title xml:lang="en">Test Study Registration</title>
    </titles>
    <publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
    <publicationYear>2024</publicationYear>
    <resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
    <contributors>
        <contributor contributorType="DataCollector">
            <contributorName>Lucky</contributorName>
            <familyName>Hakoyama</familyName>
            <affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
        </contributor>
    </contributors>
    <relatedIdentifiers>
        <relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE00AO</relatedIdentifier>
    </relatedIdentifiers>
    <fundingReferences>
        <fundingReference>
            <funderName>Wellcome Trust</funderName>
            <funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
            <awardNumber>23456</awardNumber>
        </fundingReference>
    </fundingReferences>
</resource>

Update title using the API:

import argparse
import requests
import urllib.parse

def parse_arguments():
    parser = argparse.ArgumentParser(
        description="Update DOI metadata using EZID API")
    parser.add_argument("-d", "--doi", required=True, help="DOI to update")
    parser.add_argument("-x", "--xml_file", required=True,
                        help="Path to XML file containing metadata")
    parser.add_argument("-u", "--username", required=True,
                        help="EZID username")
    parser.add_argument("-p", "--password", required=True,
                        help="EZID password")
    parser.add_argument("-e", "--environment", required=True, choices=['stg', 'prd'],
                        help="Choose environment: 'stg' for staging, 'prd' for production")
    return parser.parse_args()

def get_base_url(environment):
    if environment == 'stg':
        return "https://ezid-stg.cdlib.org"
    elif environment == 'prd':
        return "https://ezid.cdlib.org"
    else:
        raise ValueError("Invalid environment. Choose 'stg' or 'prd'.")

def encode_doi(doi):
    return urllib.parse.quote(doi, safe="")

def read_xml_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def prepare_update_data(xml_content):
    # Escape special characters in the XML content
    escaped_content = xml_content.replace(
        '%', '%25').replace('\n', '%0A').replace('\r', '%0D')
    return f"datacite: {escaped_content}"

def update_doi_metadata(base_url, doi, xml_content, username, password):
    encoded_doi = encode_doi(doi)
    update_data = prepare_update_data(xml_content)
    response = requests.post(
        f"{base_url}/id/{encoded_doi}",
        auth=(username, password),
        data=update_data.encode('utf-8'),
        headers={'Content-Type': 'text/plain; charset=UTF-8'}
    )
    return response

def handle_response(response, doi):
    if response.status_code == 200:
        print(f"Successfully updated the metadata of {doi}")
        print(f"Response: {response.text}")
    else:
        print(f"Failed to update the metadata. Status code: {response.status_code}")
        print(f"Response: {response.text}")

def main():
    args = parse_arguments()
    base_url = get_base_url(args.environment)
    xml_content = read_xml_file(args.xml_file)
    response = update_doi_metadata(
        base_url,
        args.doi,
        xml_content,
        args.username,
        args.password
    )
    handle_response(response, args.doi)

if __name__ == "__main__":
    main()

Title updated successfully:

https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5072/FK29K4GW2H

<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
    <identifier identifierType="DOI">10.5072/FK29K4GW2H</identifier>
    <creators>
        <creator>
            <creatorName>EZID</creatorName>
        </creator>
    </creators>
    <titles>
        <title xml:lang="en">API Update - Test Study Registration</title>
    </titles>
    <publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
    <publicationYear>2024</publicationYear>
    <resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
    <contributors>
        <contributor contributorType="DataCollector">
            <contributorName>Lucky</contributorName>
            <familyName>Hakoyama</familyName>
            <affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
        </contributor>
    </contributors>
    <relatedIdentifiers>
        <relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE00AO</relatedIdentifier>
    </relatedIdentifiers>
    <fundingReferences>
        <fundingReference>
            <funderName>Wellcome Trust</funderName>
            <funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
            <awardNumber>23456</awardNumber>
        </fundingReference>
    </fundingReferences>
</resource>
adambuttrick commented 5 months ago

Create DOI 10.5072/FK29K4GW2H via the API, incorporating all Schema 4.5 changes.

import argparse
import requests
import urllib.parse
import sys

def parse_arguments():
    parser = argparse.ArgumentParser(
        description="Create or update DOI metadata using EZID API")
    parser.add_argument("-d", "--doi", required=True,
                        help="DOI to create or update")
    parser.add_argument("-x", "--xml_file", required=True,
                        help="Path to XML file containing metadata")
    parser.add_argument("-u", "--username", required=True,
                        help="EZID username")
    parser.add_argument("-p", "--password", required=True,
                        help="EZID password")
    parser.add_argument("-e", "--environment", required=True, choices=['stg', 'prd'],
                        help="Choose environment: 'stg' for staging, 'prd' for production")
    parser.add_argument("-a", "--action", required=True, choices=['create', 'update'],
                        help="Choose action: 'create' for new DOI, 'update' for existing DOI")
    return parser.parse_args()

def get_base_url(environment):
    if environment == 'stg':
        return "https://ezid-stg.cdlib.org"
    elif environment == 'prd':
        return "https://ezid.cdlib.org"
    else:
        raise ValueError("Invalid environment. Choose 'stg' or 'prd'.")

def encode_doi(doi):
    return urllib.parse.quote(doi, safe="")

def read_xml_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def prepare_update_data(xml_content):
    escaped_content = xml_content.replace(
        '%', '%25').replace('\n', '%0A').replace('\r', '%0D')
    return f"datacite: {escaped_content}"

def create_or_update_doi_metadata(base_url, doi, xml_content, username, password, action):
    encoded_doi = encode_doi(doi)
    update_data = prepare_update_data(xml_content)

    if action == 'create':
        url = f"{base_url}/id/{encoded_doi}"
        method = requests.put
    else:  # update
        url = f"{base_url}/id/{encoded_doi}"
        method = requests.post

    response = method(
        url,
        auth=(username, password),
        data=update_data.encode('utf-8'),
        headers={'Content-Type': 'text/plain; charset=UTF-8'}
    )
    return response

def handle_response(response, doi, action):
    if response.status_code in [200, 201]:
        print(f"Successfully {'created' if action == 'create' else 'updated'} the metadata of {doi}")
        print(f"Response: {response.text}")
    else:
        print(f"Failed to {'create' if action == 'create' else 'update'} the metadata. Status code: {response.status_code}")
        print(f"Response: {response.text}")

def main():
    args = parse_arguments()
    base_url = get_base_url(args.environment)
    xml_content = read_xml_file(args.xml_file)
    response = create_or_update_doi_metadata(
        base_url,
        args.doi,
        xml_content,
        args.username,
        args.password,
        args.action
    )
    handle_response(response, args.doi, args.action)

if __name__ == "__main__":
    main()

Created successfully:

<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
    <identifier identifierType="DOI">10.5072/FK29K4GB3I</identifier>
    <creators>
        <creator>
            <creatorName>EZID</creatorName>
        </creator>
    </creators>
    <titles>
        <title xml:lang="en">API Update - Test Study Registration</title>
    </titles>
    <publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
    <publicationYear>2024</publicationYear>
    <resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
    <contributors>
        <contributor contributorType="DataCollector">
            <contributorName>Lucky</contributorName>
            <familyName>The Poodle</familyName>
            <affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
        </contributor>
    </contributors>
    <relatedIdentifiers>
        <relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE10AO</relatedIdentifier>
    </relatedIdentifiers>
    <fundingReferences>
        <fundingReference>
            <funderName>Wellcome Trust</funderName>
            <funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
            <awardNumber>23456</awardNumber>
        </fundingReference>
    </fundingReferences>
</resource>

Update via the UI, altering 4.5 fields. Updated successfully:

<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
    <identifier identifierType="DOI">10.5072/FK29K4GB3I</identifier>
    <creators>
        <creator>
            <creatorName>EZID</creatorName>
        </creator>
    </creators>
    <titles>
        <title xml:lang="en">API Update - Test Study Registration</title>
    </titles>
    <publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
    <publicationYear>2024</publicationYear>
    <resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
    <contributors>
        <contributor contributorType="DataCollector">
            <contributorName>Lucky</contributorName>
            <familyName>The Poodle</familyName>
            <affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
        </contributor>
    </contributors>
    <relatedIdentifiers>
        <relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE10AO</relatedIdentifier>
    </relatedIdentifiers>
    <fundingReferences>
        <fundingReference>
            <funderName>Wellcome Trust</funderName>
            <funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
            <awardNumber>23456</awardNumber>
        </fundingReference>
    </fundingReferences>
</resource>
jsjiang commented 5 months ago

Production deployment:

jsjiang commented 5 months ago
jsjiang commented 4 months ago

Production deployment:

jsjiang commented 4 months ago

Deployed v3.2.13 on uc3-ezidui-prd02 first:

Deployed v3.2.13 on uc3-ezidui-prd01 at 6:59am.

Prd02 at 6:50am:

Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Exec[ansible-playbook]: Triggered 'refresh' from 2 events
Info: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Exec[ansible-playbook]: Scheduling refresh of Service[ezid]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Service[ezid]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 63.06 seconds
uc3_pupapply.sh: Execution of 'puppet apply' complete. Exit code: 2
uc3puppet@uc3-ezidui-prd02:~> exit
logout
uc3-ezidui-prd02:/home/jjiang>sudo su - ezid
Last login: Wed Jul 10 06:49:58 PDT 2024
ezid@uc3-ezidui-prd02:06:51:28:~$ ps -ef | grep ezid
ezid     1181285       1  0 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid     1181286 1181285  2 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid     1181287 1181285  0 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid     1181288 1181285  0 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid     1181289 1181285  0 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid     1181558 1181285  0 06:50 ?        00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
root     1181614 1175628  0 06:51 pts/0    00:00:00 sudo su - ezid
root     1181616 1181614  0 06:51 pts/1    00:00:00 sudo su - ezid
root     1181617 1181616  0 06:51 pts/1    00:00:00 su - ezid
ezid     1181618 1181617  0 06:51 pts/1    00:00:00 -bash
ezid     1181848 1181618  0 06:51 pts/1    00:00:00 ps -ef
ezid     1181849 1181618  0 06:51 pts/1    00:00:00 grep --color=auto ezid
ezid@uc3-ezidui-prd02:06:51:33:~$ cdlsysctl stop ezid
Failed to stop ezid.service: Access denied
See system logs and 'systemctl status ezid.service' for details.
ezid@uc3-ezidui-prd02:06:51:47:~$ sudo cdlsysctl stop ezid
ezid@uc3-ezidui-prd02:06:51:58:~$ pwd
/ezid
ezid@uc3-ezidui-prd02:06:52:00:~$ cd ezid
ezid@uc3-ezidui-prd02:06:52:03:~/ezid$ python manage.py migrate
System check identified some issues:

WARNINGS:
?: (mysql.W002) MySQL Strict Mode is not set for database connection 'default'
    HINT: MySQL's Strict Mode fixes many data integrity problems in MySQL, such as data truncation upon insertion, by escalating warnings into errors. It is strongly recommended you activate it. See: https://docs.djangoproject.com/en/4.2/ref/databases/#mysql-sql-mode
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, ezidapp, sessions
Running migrations:
  Applying ezidapp.0006_alter_searchidentifier_searchableresourcetype... OK

Prd01 at 6:59am:

Info: Systemd::Daemon_reload[ezid-proc-search-indexer.service]: Scheduling refresh of Service[ezid-proc-search-indexer.service]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-search-indexer]/Systemd::Unit_file[ezid-proc-search-indexer.service]/Service[ezid-proc-search-indexer.service]: Triggered 'refresh' from 2 events
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-stats]/Systemd::Unit_file[ezid-proc-stats.service]/Systemd::Daemon_reload[ezid-proc-stats.service]/Exec[systemd-ezid-proc-stats.service-systemctl-daemon-reload]: Triggered 'refresh' from 1 event
Info: Systemd::Daemon_reload[ezid-proc-stats.service]: Scheduling refresh of Service[ezid-proc-stats.service]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-stats]/Systemd::Unit_file[ezid-proc-stats.service]/Service[ezid-proc-stats.service]: Triggered 'refresh' from 2 events
Notice: Applied catalog in 97.32 seconds
uc3_pupapply.sh: Execution of 'puppet apply' complete. Exit code: 2
uc3puppet@uc3-ezidui-prd01:~> exit
logout
uc3-ezidui-prd01:/home/jjiang>sudo su - ezid
Last login: Wed Jul 10 06:57:12 PDT 2024
ezid@uc3-ezidui-prd01:06:59:03:~$ cd ezid-ops-scripts/scripts/
ezid@uc3-ezidui-prd01:06:59:09:~/ezid-ops-scripts/scripts$ python verify_ezid_after_patching.py -e prd
ok 1.1 - Verify EZID status
info 1.2 - EZID version - v3.2.13
ok 2 - Verify search function
## Create identifier
ok 3.1 - doi:10.15697/FK2T616 created
ok 3.2 - doi:10.15697/FK2PC91 created
ok 3.3 - ark:/99999/fk47q0nr3s created
ok 3.4 - doi:10.5072/FK22R40T38 created
ok 3.5 - doi:10.5072/FK2Z03956W created
ok 3.6 - ark:/99999/fk43x9xx80 created
ok 3.7 - doi:10.5072/FK2T72KC3P created
ok 3.8 - ark:/99999/fk4059742f created
## Update identifier
ok 4 - ark:/99999/fk4059742f updated with new data: b'_target: https://cdlib.org/services/\n'
## Check background job status
ok 5.1 - ezid-proc-binder active running
ok 5.2 - ezid-proc-cleanup-async-queues active running
error 5.3 - ezid-proc-celery is not running
ok 5.4 - ezid-proc-crossref active running
ok 5.5 - ezid-proc-datacite active running
ok 5.6 - ezid-proc-download active running
ok 5.7 - ezid-proc-expunge active running
ok 5.8 - ezid-proc-newsfeed active running
ok 5.9 - ezid-proc-search-indexer active running
ok 5.10 - ezid-proc-stats active running
ok 5.11 - ezid-proc-link-checker active running
ok 5.12 - ezid-proc-link-checker-update active running
## Check batch download from S3
waiting for file to become available: 5s passed
waiting for file to become available: 10s passed
waiting for file to become available: 15s passed
waiting for file to become available: 20s passed
waiting for file to become available: 25s passed
ok 6 - batch download file is available at: https://ezid.cdlib.org/s3_download/t2OJoO013P4iacBh.csv.gz
jsjiang commented 4 months ago

Post-deployment tests:

jsjiang commented 4 months ago

Seeing RDS memory decrease starting from 6:15 this morning. Available memory Capacity dropped from 1K to 137MB and remaining low in the 100MB range. It is probably not deployment related. I will create a ticket for this.

IAS ticket: https://github.com/cdlib/cdlsys/issues/538

jsjiang commented 4 months ago