Closed mariagould closed 4 months ago
Need updates to the following static pages based on Schema 4.5
Note:
EZID_AdvancedCreate.pdf
and EZID_ResourceTypes.pdf
, change "resourceType" to "resourceTypeGeneral" accordingly, add newly added allowed terms.manage.py collectstatic --clear
commandRecords by schema versions (https://doi.datacite.org/providers/cdlco/dois):
Test cases:
Note: you can get records by schema using DataCite API:
https://api.datacite.org/dois?client-id=CDLCO&schema-version=3
Sample records:
10.5062/f4445jf2 (schema 2.2) https://doi.datacite.org/dois/10.5062%2Ff4445jf2 Record on DataCite:
<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-2.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datacite.org/schema/kernel-2.2
http://schema.datacite.org/meta/kernel-2.2/metadata.xsd">
<identifier identifierType="DOI">10.5062/F4445JF2</identifier>
<creators>
<creator>
<creatorName>Flora Shrode</creatorName>
</creator>
</creators>
<titles>
<title>The Intellectual Foundation of Information Organization [Review]</title>
</titles>
<publisher>Issues in Science and Technology Librarianship</publisher>
<publicationYear>2000</publicationYear>
<resourceType resourceTypeGeneral="Text"/>
</resource>
Record in EZID:
{
"datacite.title": "The Intellectual Foundation of Information Organization [Review]",
"datacite.creator": "Flora Shrode",
"datacite.publisher": "Issues in Science and Technology Librarianship",
"datacite.resourcetype": "Text",
"datacite.publicationyear": "2000"
}
10.5062/f4bg2m07 (schema 3) https://doi.datacite.org/dois/10.5062%2Ff4bg2m07 record on Datacite:
<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datacite.org/schema/kernel-3
http://schema.datacite.org/meta/kernel-3/metadata.xsd">
<identifier identifierType="DOI">10.5062/F4BG2M07</identifier>
<creators>
<creator>
<creatorName>Michael Fosmire</creatorName>
</creator>
</creators>
<titles>
<title>Science Media Education: Opportunities for Libraries?</title>
</titles>
<publisher>Issues in Science & Technology Librarianship</publisher>
<publicationYear>2016</publicationYear>
<resourceType resourceTypeGeneral="Text"/>
</resource>
Record in EZID:
{
"datacite.title": "Science Media Education: Opportunities for Libraries?",
"datacite.creator": "Michael Fosmire",
"datacite.publisher": "Issues in Science & Technology Librarianship",
"datacite.resourcetype": "Text",
"datacite.publicationyear": "2016"
}
10.5062/f4p848v3 (schema 4) https://doi.datacite.org/dois/10.5062%2Ff4p848v3 Record on DataCite:
<?xml version="1.0" encoding="utf-8"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5062/F4P848V3</identifier>
<creators>
<creator>
<creatorName>Erin O'Toole</creatorName>
</creator>
</creators>
<titles>
<title>Selected Internet Resources on Genetically Modified Organisms (GMOs)</title>
</titles>
<publisher>Issues in Science and Technology Librarianship</publisher>
<publicationYear>2010</publicationYear>
<resourceType resourceTypeGeneral="Text">Text</resourceType>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.5062/F4P848V</relatedIdentifier>
</relatedIdentifiers>
</resource>
Record in EZID:
{
"datacite.title": "Selected Internet Resources on Genetically Modified Organisms (GMOs)",
"datacite.creator": "Erin O'Toole",
"datacite.publisher": "Issues in Science and Technology Librarianship",
"datacite.resourcetype": "Text",
"datacite.publicationyear": "2010"
}
Test API create records:
doi:10.5062/F4
(https://ezid-dev.cdlib.org/shoulder/doi:10.5062/F4):
doi:10.5062/F4
(https://ezid-dev.cdlib.org/shoulder/doi:10.5062/F4):
doi:10.5062/F4
(https://ezid-dev.cdlib.org/shoulder/doi:10.5062/F4):
Note: the default DataCite schema is set in the formElementsToDataciteXml function as:
namespace = "http://datacite.org/schema/kernel-4"
schemaLocation = "http://schema.datacite.org/meta/kernel-4/metadata.xsd"
Test updating older schema records:
Regarding:
Questions:
upgradeDcmsRecord
function in the datacite.py
script. This function converts a DataCite metadata schema record to the latest version (currently version 4) by
resourceType
element (a Schema 4 required data element) is not provided, add one with default values (e.attrib["resourceTypeGeneral"] = "Other", e.text = "(:unav)")resourceType
exists but resourceTypeGeneral
is missing: do not modify the record and let the validation process to report the error.Convert contributor type "funder" to Schema 4 compatible data element:
From:
<root xmlns:N="http://example.com/ns">
<contributors>
<contributor contributorType="Funder">
<contributorName>John Doe Foundation</contributorName>
</contributor>
<contributor contributorType="Other">
<contributorName>Jane Smith</contributorName>
</contributor>
</contributors>
</root>
To:
<root xmlns:N="http://example.com/ns">
<contributors>
<contributor contributorType="Other">
<contributorName>Jane Smith</contributorName>
</contributor>
</contributors>
<fundingReferences>
<fundingReference>
<funderName>John Doe Foundation</funderName>
</fundingReference>
</fundingReferences>
</root>
Current:
Updating:
TO-DO:
Sample schema 3 records on ezid-stg:
New workflow(draft) Creating new ID:
Updating:
Question:
New workflow until Jan 2025 Creating new ID:
Updating:
Workflow after Jan 2025 Creating new ID:
Updating:
Regarding the newly added resourceTypeGeneral
values:
resourceTypeGeneral
is a required data element and validation is requiredBatch upgrade older version records to version 4.x:
Test batch register tool:
CDL-jjiang-9m:ezid-client-tools jjiang$ python batch-register3_stg.py -c admin:pwd -s doi:10.5062/F4 mint mapping.cfg input_datacite_4.csv 1,doi:10.5062/F42R3QTW, 2,doi:10.5062/F4Z0379V, 3,doi:10.5062/F4T72GK4, 4,doi:10.5062/F4PG1QVD,
mapping.cfg
_profile = datacite
/resource/titles/title = $1
/resource/creators/creator/creatorName = $2
/resource/creators/creator/nameIdentifier = $3
/resource/creators/creator/nameIdentifier@nameIdentifierScheme = $4
/resource/publisher = $5
/resource/publisher@publisherIdentifier = $6
/resource/publisher@publisherIdentifierScheme = $7
/resource/publisher@schemeURI = $8
/resource/publicationYear = $9
/resource/resourceType = $10
/resource/resourceType@resourceTypeGeneral = $11
_target = $12
input_datacite_4.csv
test title 1,test creator name 1,https://orcid.org/0000-0003-1660-3511,ORCID,test publisher 1,https://ror.org/03yrm5c26,ROR,https://ror.org/,2020,pre-print,Book,https://google.com/
test title 2,test creator name 2,https://orcid.org/0000-0002-9315-0678,ORCID,test publisher 2,https://ror.org/03yrm5c26,ROR,,2021,research data,Dataset,https://google.com/
test title 3,test creator name 3,https://orcid.org/0000-0002-9315-0678,ORCID,test publisher 3,https://ror.org/01an7q238,ROR,https://ror.org/,2022,Violin,Instrument,https://google.com/
test title 4,test creator name 4,https://orcid.org/0000-0002-4216-1107,ORCID,test publisher 4,https://ror.org/01an7q238,ROR,,2023,registration,StudyRegistration,https://google.com/
Noticed a problem while testing the batch register script: the newly added publisher identifier related sub-properties are saved as attributes in our system as expected but they are not showing up in DataCite testing system. This affects minting, creating and updating version 4.5 records containing the publisher identifier related sub-properties.
Filed a ticket with DataCite support:
Subject: Testing DataCite Schema 4.5 in the test environment
to: support@datacite.org <support@datacite.org>
Sent: 4/15, 3:59PM
I ran into problems while testing the DataCite Schema 4.5 upgrade in the EZID/DataCite test environment. I created a few DOIs with the three new sub-properties to the publisher property:
publisherIdentifier
publisherIdentifierScheme
schemeUri
These sub-properties are saved as attributes to the publisher element in the XML document in our system. However, these sub-properties are not showing up in the DataCite testing system.
Here are a few sample DOIs:
https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5062/F42R3QTW
with:
<publisher publisherIdentifier=https://ror.org/03yrm5c26 publisherIdentifierScheme="ROR" schemeURI=https://ror.org/>California Digital Library</publisher>
However, the record created in the DataCite test system does not contain the publisherIdentifier related data elements:
https://api.test.datacite.org/dois/10.5062/F42R3QTW
"publisher": "California Digital Library",
"container": {
},
Another example:
https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5062/F4PG1QVD
https://api.test.datacite.org/dois/10.5062/F4PG1QVD
Can you please help troubleshoot this issue?
A note on the "resourcetypeGeneral" support to the datacite
profile using the datacite.fieldname
format:
datacite.resourcetype
field to support general and specific types in the format of: "general type/specific type"datacite.resourcetypegeneral
data field is provided, EZID will accept it and keep as is without processing or validation. This data field will not be sent to DataCite.Sample record:
_profile: datacite
_target: https://google.com
datacite.creator: test creator
datacite.title: test datacite doi
datacite.publisher: ACM
datacite.publicationyear: 2023
datacite.resourcetype: Book/Large Print
Resolved:
From: Xiaoli Chen <support@datacite.org>
Date: Tuesday, April 16, 2024 at 2:06 AM
To: Jing Jiang <Jing.Jiang@ucop.edu>
Subject: Re: Testing DataCite Schema 4.5 in the test environment
CAUTION: EXTERNAL EMAIL
Hi Jing,
Thanks for the email.
An additional URL parameter &publisher=true is required to display the publisher information, [here's a bit more information](https://support.datacite.org/docs/can-i-see-more-detailed-affiliation-information-in-the-rest-api#publisher-identifiers).
This query should display the full publisher information:
https://api.test.datacite.org/dois/10.5062/F4PG1QVD?publisher=true
"publisher": {
"name": "California Digital Library",
"schemeUri": "https://ror.org/",
"publisherIdentifier": "https://ror.org/01an7q238",
"publisherIdentifierScheme": "ROR"
},
Hope this helps, let me know if we can support further!
Best regards,
Xiaoli
May 30: Merged the develop
branch into merge_bdb_and_dc45
to include Django 4.2.11 upgrade:
CDL-jjiang-9m:ezid jjiang$ git merge develop
Auto-merging impl/form_objects.py
Merge made by the 'ort' strategy.
ansible/templates/etc/httpd/conf.d/03-ezid-nossl.conf.j2 | 7 -
ezidapp/migrations/0001_squashed_0005_rename_index.py | 438 ++++++++++++++++++++++++++++++++
ezidapp/migrations/{ => squashed_migrations}/0001_initial.py | 0
.../migrations/{ => squashed_migrations}/0002_auto_20221026_1139.py | 0
.../migrations/{ => squashed_migrations}/0003_auto_20230809_1154.py | 0
ezidapp/migrations/{ => squashed_migrations}/0004_minter.py | 0
ezidapp/migrations/squashed_migrations/0005_rename_index.py | 158 ++++++++++++
ezidapp/models/identifier.py | 59 ++---
ezidapp/models/link_checker.py | 2 +-
impl/form_objects.py | 2 +-
impl/ui.py | 2 +-
impl/ui_account.py | 4 +-
impl/ui_admin.py | 4 +-
impl/ui_common.py | 4 +-
impl/ui_create.py | 2 +-
impl/ui_manage.py | 2 +-
impl/ui_search.py | 2 +-
requirements-dev.txt | 2 +-
requirements.txt | 2 +-
settings/settings.py.j2 | 7 +
ui_tags/templatetags/manage_form_tags.py | 2 +-
ui_tags/templatetags/menus.py | 2 +-
22 files changed, 647 insertions(+), 54 deletions(-)
create mode 100644 ezidapp/migrations/0001_squashed_0005_rename_index.py
rename ezidapp/migrations/{ => squashed_migrations}/0001_initial.py (100%)
rename ezidapp/migrations/{ => squashed_migrations}/0002_auto_20221026_1139.py (100%)
rename ezidapp/migrations/{ => squashed_migrations}/0003_auto_20230809_1154.py (100%)
rename ezidapp/migrations/{ => squashed_migrations}/0004_minter.py (100%)
create mode 100644 ezidapp/migrations/squashed_migrations/0005_rename_index.py
Quick start guides updated in the attached. If someone could please do a double check, I would appreciate!
EZID_AdvancedCreate.pdf EZID_ResourceTypes.pdf EZID_RelationTypes.pdf
@adambuttrick Thank you Adam for updating the PDFs. For the advanced create guide:
ResourceTypeGeneral
and indicate ResourceType
is free text, ResourceTypeGeneral
is controlled vocabulary.The other two files look good to me.
Jing
@jsjiang Revised in the attached. Please let me know if this reflects desired changes.
@adambuttrick Hi Adam,
How about replace the original ResourceType
property line (5th from the top) with the newly added ResourceType (with subproperty resourceTypeGeneral)
. An asterisk (*) is needed as it is a mandatory property.
Jing
June 10: Merged the develop branch to the merge_bdb_and_dc45
branch to include Poetry implementation:
(ezid-py38) CDL-jjiang-9m:ezid jjiang$ git config pull.rebase false
(ezid-py38) CDL-jjiang-9m:ezid jjiang$ git pull origin develop
From https://github.com/CDLUC3/ezid
* branch develop -> FETCH_HEAD
Merge made by the 'ort' strategy.
.github/workflows/main.yml | 25 +-
README.2.7.md | 141 +++++++++++
README.md | 313 ++++++++++++++---------
ansible/group_vars/all | 1 +
ansible/notes/notes.migrating_to_poetry | 237 +++++++++++++++++
ansible/roles/ezid/tasks/configure_ezid.yaml | 36 +--
poetry.lock | 1424 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
pyproject.toml | 83 ++++++
requirements-dev.txt | 37 +--
requirements.txt | 31 ---
setup.py => setup.py.bk | 0
ui_tags/templatetags/decorators.py | 4 +-
update_pyproject.sh | 37 +++
13 files changed, 2158 insertions(+), 211 deletions(-)
create mode 100644 README.2.7.md
create mode 100644 ansible/notes/notes.migrating_to_poetry
create mode 100644 poetry.lock
delete mode 100644 requirements.txt
rename setup.py => setup.py.bk (100%)
create mode 100755 update_pyproject.sh
Final revision to advanced create: EZID_AdvancedCreate.pdf
CDL accounts with DataCite DOIs:
select user.id as user_id, user.username, user.displayName, shoulder.prefix as shoulder_prefix, shoulder.name as shoulder_name from ezidapp_user user
left join ezidapp_user_shoulders user_shoulders
on user.id = user_shoulders.user_id
left join `ezidapp_shoulder` shoulder
on shoulder.id = user_shoulders.shoulder_id
where shoulder.type = 'DOI' and shoulder.crossrefEnabled = 0
and (user.displayName like 'CDL%' or user.username='dmptool')
order by user.id, shoulder.prefix;
user_id username displayName shoulder_prefix shoulder_name
65 eschol_harvester CDL eScholarship doi:10.15779/J2 Berkeley Law Library
65 eschol_harvester CDL eScholarship doi:10.15779/Z38 Berkeley Law School Journals
65 eschol_harvester CDL eScholarship doi:10.20353/K3 UC Observatories
65 eschol_harvester CDL eScholarship doi:10.21418/G8 UCB Geotechnical Engineering Research
65 eschol_harvester CDL eScholarship doi:10.21980/J8 UCI JETem
65 eschol_harvester CDL eScholarship doi:10.23733/M3 UCLA Music Library
65 eschol_harvester CDL eScholarship doi:10.34940/E2 CDL eScholarship (Supplemental Material)
65 eschol_harvester CDL eScholarship doi:10.34950/E2 UC Berkeley eScholarship (Supplemental Material)
65 eschol_harvester CDL eScholarship doi:10.34951/E2 UC Davis eScholarship (General)
65 eschol_harvester CDL eScholarship doi:10.48453/S3 Ultrasound in Resource-Limited Settings
65 eschol_harvester CDL eScholarship doi:10.6074/D4 eschol DataCite
65 eschol_harvester CDL eScholarship doi:10.7268/P1 UC Riverside Bourns College of Engineering
65 eschol_harvester CDL eScholarship doi:10.7286/ Biocode Commons (UCB only)
124 merritt CDL UC3 Merritt doi:10.17916/P6 UC Press
124 merritt CDL UC3 Merritt doi:10.18736/D6 UCOP Dryad
124 merritt CDL UC3 Merritt doi:10.25338/B8 UC Davis Bio Agr Eng Dash
124 merritt CDL UC3 Merritt doi:10.25349/D9 UCSB Dash
124 merritt CDL UC3 Merritt doi:10.5068/D1 UCLA Dash
124 merritt CDL UC3 Merritt doi:10.6071/M3 UCM Dash
124 merritt CDL UC3 Merritt doi:10.6071/Z7 UCM SSCZO
124 merritt CDL UC3 Merritt doi:10.6075/J0 UCSD
124 merritt CDL UC3 Merritt doi:10.6076/D1 UCSD Dryad
124 merritt CDL UC3 Merritt doi:10.6078/D1 UCB Dash
124 merritt CDL UC3 Merritt doi:10.6086/D1 UC Riverside DASH
124 merritt CDL UC3 Merritt doi:10.7268/P1 UC Riverside Bourns College of Engineering
124 merritt CDL UC3 Merritt doi:10.7272/Q6 UCSF Clinical & Translational Science Institute (CTSI)
124 merritt CDL UC3 Merritt doi:10.7280/D1 UCI Dash
124 merritt CDL UC3 Merritt doi:10.7291/D1 UCSC Dash
124 merritt CDL UC3 Merritt doi:10.7297/X2 UC Berkeley Department of Linguistics
124 merritt CDL UC3 Merritt doi:10.7941/D1 LBNL Dash
318 dash CDL UC3 Dash doi:10.17916/P6 UC Press
318 dash CDL UC3 Dash doi:10.18736/D6 UCOP Dryad
318 dash CDL UC3 Dash doi:10.25338/B8 UC Davis Bio Agr Eng Dash
318 dash CDL UC3 Dash doi:10.25349/D9 UCSB Dash
318 dash CDL UC3 Dash doi:10.5068/D1 UCLA Dash
318 dash CDL UC3 Dash doi:10.6071/M3 UCM Dash
318 dash CDL UC3 Dash doi:10.6075/J0 UCSD
318 dash CDL UC3 Dash doi:10.6076/D1 UCSD Dryad
318 dash CDL UC3 Dash doi:10.6078/D1 UCB Dash
318 dash CDL UC3 Dash doi:10.6086/D1 UC Riverside DASH
318 dash CDL UC3 Dash doi:10.7272/Q6 UCSF Clinical & Translational Science Institute (CTSI)
318 dash CDL UC3 Dash doi:10.7280/D1 UCI Dash
318 dash CDL UC3 Dash doi:10.7291/D1 UCSC Dash
318 dash CDL UC3 Dash doi:10.7941/D1 LBNL Dash
468 dmptool DMPTool doi:10.48321/D1 CDL DMPTool
Updated Quick Start Guides (3 PDFs, commit bea2719f4910787fb5d359f4ed1e4c892c4e2f1b)
as tag v3.2.12rc0
on ezid-stg for testing (Jun 20) Testing log Version 2.2 records with resourceTyepeGeneral:
Version 2.2 records without resourceTyepeGeneral: updated these records using the batch-register3.py
script
resourceTypeGerneral
resourceTypeGerneral
Script output: 1,doi:10.5062/F44M92G2, 2,doi:10.5062/F40V89RB, 3,doi:10.5060/D4RN35SD/ZMUTT1894081101, 4,doi:10.5060/D4RN35SD/AGS_M_58_S63,
doi:10.5062/F40V89RB metadata on EZID:
{
"datacite.title": "Using ISI Web of Science to Compare Top-Ranked Journals to the Citation Habits of a \"Real World\" Academic Department",
"datacite.creator": "Jeremy Cusker",
"datacite.publisher": "Issues in Science and Technology Librarianship",
"datacite.publicationyear": "2012"
}
10.5062/f44m92g2
and 10.5062/f40v89rb
. Both records were updated to schema v4 with resourceTypeGeneral="Other" and resourceType=(:unav)DMPTool tested DOI registration on stage:
https://dmphub.uc3stg.cdlib.net/dmps/10.48321/D156A867AF
Updated to v4 record with resourceTypeGeneral="OutputManagementPlan" and resourceType=Data Management Plan
<resource xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.48321/D156A867AF</identifier>
<creators>
<creator>
<creatorName nameType="Personal">Maria Praetzellis</creatorName>
<nameIdentifier schemeURI="https://orcid.org/" nameIdentifierScheme="ORCID">
https://orcid.org/0000-0001-5047-3090
</nameIdentifier>
<affiliation affiliationIdentifier="https://ror.org/01an7q238" affiliationIdentifierScheme="ROR">
University of California, Berkeley
</affiliation>
</creator>
</creators>
<titles>
<title xml:lang="en-US">testing</title>
</titles>
<publisher xml:lang="en-US">DMPTool</publisher>
<publicationYear>2024</publicationYear>
<language>en</language>
<resourceType resourceTypeGeneral="OutputManagementPlan">Data Management Plan</resourceType>
<descriptions>
<description xml:lang="en" descriptionType="Abstract"> </description>
</descriptions>
<contributors>
<contributor contributorType="Producer">
<contributorName nameType="Organizational">University of California, Berkeley</contributorName>
<nameIdentifier schemeURI="https://ror.org/" nameIdentifierScheme="ROR"> https://ror.org/01an7q238 </nameIdentifier>
</contributor>
</contributors>
<fundingReferences>
<fundingReference>
<funderName>National Institutes of Health (nih.gov)</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/01cwqze88</funderIdentifier>
</fundingReference>
</fundingReferences>
</resource>
Testing with below:
import argparse
import requests
def fetch_datacite_metadata(doi, test):
if test:
url = f"https://api.test.datacite.org/dois/{doi}"
else:
url = f"https://api.datacite.org/dois/{doi}"
headers = {"Accept": "application/vnd.datacite.datacite+xml"}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Error: Unable to retrieve record (Status code: {response.status_code})")
def save_xml_to_file(doi, xml_content):
filename = f"{doi.replace('/', '_')}.xml"
with open(filename, 'w', encoding='utf-8') as file:
file.write(xml_content)
return filename
def main():
parser = argparse.ArgumentParser(description="Fetch DataCite metadata in XML format.")
parser.add_argument("-d", "--doi", required=True, help="DOI of the record to retrieve.")
parser.add_argument("-t", "--test", action='store_true', help="Use DataCite test vs. prod")
args = parser.parse_args()
try:
xml_content = fetch_datacite_metadata(args.doi, args.test)
filename = save_xml_to_file(args.doi, xml_content)
except Exception as e:
print(e)
if __name__ == "__main__":
main()
10.5062/f44m92g2 Registered - https://api.test.datacite.org/dois/10.5062/f44m92g2 Updated to v4 record with resourceTypeGeneral="Other" and resourceType=(:unav)
<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5062/F44M92G2</identifier>
<creators>
<creator>
<creatorName>Dianne Dietrich et al</creatorName>
</creator>
</creators>
<titles>
<title>De-Mystifying the Data Management Requirements of Research Funders</title>
</titles>
<publisher>Issues in Science and Technology Librarianship</publisher>
<publicationYear>2012</publicationYear>
<resourceType resourceTypeGeneral="Other">(:unav)</resourceType>
</resource>
10.5062/f40v89rb Registered - https://api.test.datacite.org/dois/10.5062/f40v89rb Updated to v4 record with resourceTypeGeneral="Other" and resourceType=(:unav)
<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5062/F40V89RB</identifier>
<creators>
<creator>
<creatorName>Jeremy Cusker</creatorName>
</creator>
</creators>
<titles>
<title>Using ISI Web of Science to Compare Top-Ranked Journals to the Citation Habits of a "Real World" Academic Department</title>
</titles>
<publisher>Issues in Science and Technology Librarianship</publisher>
<publicationYear>2012</publicationYear>
<resourceType resourceTypeGeneral="Other">(:unav)</resourceType>
</resource>
Testing log for updating version 3 records:
Retrieve DataCite test API only returned two version 3 records:
records metadata are in XML format in EZID
records do not contain resourceType data field
records owner is admin
- these records might have been created for testing
updated 10.5062/f4h41qkt
using EZID UI
updated 10.5062/f4f47n92
using Postman
"schemaVersion": "http://datacite.org/schema/kernel-4",
"types": {
"ris": "GEN",
"bibtex": "misc",
"citeproc": "article",
"schemaOrg": "CreativeWork",
"resourceType": "(:unav)",
"resourceTypeGeneral": "Other"
},
Create DOI 10.5072/FK29K4GW2H via UI, incorporating all Schema 4.5 changes.
Created successfully:
<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5072/FK29K4GW2H</identifier>
<creators>
<creator>
<creatorName>EZID</creatorName>
</creator>
</creators>
<titles>
<title xml:lang="en">Test Study Registration</title>
</titles>
<publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
<publicationYear>2024</publicationYear>
<resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
<contributors>
<contributor contributorType="DataCollector">
<contributorName>Lucky</contributorName>
<familyName>Hakoyama</familyName>
<affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
</contributor>
</contributors>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE00AO</relatedIdentifier>
</relatedIdentifiers>
<fundingReferences>
<fundingReference>
<funderName>Wellcome Trust</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
<awardNumber>23456</awardNumber>
</fundingReference>
</fundingReferences>
</resource>
Update title using the API:
import argparse
import requests
import urllib.parse
def parse_arguments():
parser = argparse.ArgumentParser(
description="Update DOI metadata using EZID API")
parser.add_argument("-d", "--doi", required=True, help="DOI to update")
parser.add_argument("-x", "--xml_file", required=True,
help="Path to XML file containing metadata")
parser.add_argument("-u", "--username", required=True,
help="EZID username")
parser.add_argument("-p", "--password", required=True,
help="EZID password")
parser.add_argument("-e", "--environment", required=True, choices=['stg', 'prd'],
help="Choose environment: 'stg' for staging, 'prd' for production")
return parser.parse_args()
def get_base_url(environment):
if environment == 'stg':
return "https://ezid-stg.cdlib.org"
elif environment == 'prd':
return "https://ezid.cdlib.org"
else:
raise ValueError("Invalid environment. Choose 'stg' or 'prd'.")
def encode_doi(doi):
return urllib.parse.quote(doi, safe="")
def read_xml_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
return file.read()
def prepare_update_data(xml_content):
# Escape special characters in the XML content
escaped_content = xml_content.replace(
'%', '%25').replace('\n', '%0A').replace('\r', '%0D')
return f"datacite: {escaped_content}"
def update_doi_metadata(base_url, doi, xml_content, username, password):
encoded_doi = encode_doi(doi)
update_data = prepare_update_data(xml_content)
response = requests.post(
f"{base_url}/id/{encoded_doi}",
auth=(username, password),
data=update_data.encode('utf-8'),
headers={'Content-Type': 'text/plain; charset=UTF-8'}
)
return response
def handle_response(response, doi):
if response.status_code == 200:
print(f"Successfully updated the metadata of {doi}")
print(f"Response: {response.text}")
else:
print(f"Failed to update the metadata. Status code: {response.status_code}")
print(f"Response: {response.text}")
def main():
args = parse_arguments()
base_url = get_base_url(args.environment)
xml_content = read_xml_file(args.xml_file)
response = update_doi_metadata(
base_url,
args.doi,
xml_content,
args.username,
args.password
)
handle_response(response, args.doi)
if __name__ == "__main__":
main()
Title updated successfully:
https://ezid-stg.cdlib.org/manage/display_xml/doi:10.5072/FK29K4GW2H
<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5072/FK29K4GW2H</identifier>
<creators>
<creator>
<creatorName>EZID</creatorName>
</creator>
</creators>
<titles>
<title xml:lang="en">API Update - Test Study Registration</title>
</titles>
<publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
<publicationYear>2024</publicationYear>
<resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
<contributors>
<contributor contributorType="DataCollector">
<contributorName>Lucky</contributorName>
<familyName>Hakoyama</familyName>
<affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
</contributor>
</contributors>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE00AO</relatedIdentifier>
</relatedIdentifiers>
<fundingReferences>
<fundingReference>
<funderName>Wellcome Trust</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
<awardNumber>23456</awardNumber>
</fundingReference>
</fundingReferences>
</resource>
Create DOI 10.5072/FK29K4GW2H via the API, incorporating all Schema 4.5 changes.
import argparse
import requests
import urllib.parse
import sys
def parse_arguments():
parser = argparse.ArgumentParser(
description="Create or update DOI metadata using EZID API")
parser.add_argument("-d", "--doi", required=True,
help="DOI to create or update")
parser.add_argument("-x", "--xml_file", required=True,
help="Path to XML file containing metadata")
parser.add_argument("-u", "--username", required=True,
help="EZID username")
parser.add_argument("-p", "--password", required=True,
help="EZID password")
parser.add_argument("-e", "--environment", required=True, choices=['stg', 'prd'],
help="Choose environment: 'stg' for staging, 'prd' for production")
parser.add_argument("-a", "--action", required=True, choices=['create', 'update'],
help="Choose action: 'create' for new DOI, 'update' for existing DOI")
return parser.parse_args()
def get_base_url(environment):
if environment == 'stg':
return "https://ezid-stg.cdlib.org"
elif environment == 'prd':
return "https://ezid.cdlib.org"
else:
raise ValueError("Invalid environment. Choose 'stg' or 'prd'.")
def encode_doi(doi):
return urllib.parse.quote(doi, safe="")
def read_xml_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
return file.read()
def prepare_update_data(xml_content):
escaped_content = xml_content.replace(
'%', '%25').replace('\n', '%0A').replace('\r', '%0D')
return f"datacite: {escaped_content}"
def create_or_update_doi_metadata(base_url, doi, xml_content, username, password, action):
encoded_doi = encode_doi(doi)
update_data = prepare_update_data(xml_content)
if action == 'create':
url = f"{base_url}/id/{encoded_doi}"
method = requests.put
else: # update
url = f"{base_url}/id/{encoded_doi}"
method = requests.post
response = method(
url,
auth=(username, password),
data=update_data.encode('utf-8'),
headers={'Content-Type': 'text/plain; charset=UTF-8'}
)
return response
def handle_response(response, doi, action):
if response.status_code in [200, 201]:
print(f"Successfully {'created' if action == 'create' else 'updated'} the metadata of {doi}")
print(f"Response: {response.text}")
else:
print(f"Failed to {'create' if action == 'create' else 'update'} the metadata. Status code: {response.status_code}")
print(f"Response: {response.text}")
def main():
args = parse_arguments()
base_url = get_base_url(args.environment)
xml_content = read_xml_file(args.xml_file)
response = create_or_update_doi_metadata(
base_url,
args.doi,
xml_content,
args.username,
args.password,
args.action
)
handle_response(response, args.doi, args.action)
if __name__ == "__main__":
main()
Created successfully:
<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5072/FK29K4GB3I</identifier>
<creators>
<creator>
<creatorName>EZID</creatorName>
</creator>
</creators>
<titles>
<title xml:lang="en">API Update - Test Study Registration</title>
</titles>
<publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
<publicationYear>2024</publicationYear>
<resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
<contributors>
<contributor contributorType="DataCollector">
<contributorName>Lucky</contributorName>
<familyName>The Poodle</familyName>
<affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
</contributor>
</contributors>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE10AO</relatedIdentifier>
</relatedIdentifiers>
<fundingReferences>
<fundingReference>
<funderName>Wellcome Trust</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
<awardNumber>23456</awardNumber>
</fundingReference>
</fundingReferences>
</resource>
Update via the UI, altering 4.5 fields. Updated successfully:
<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
<identifier identifierType="DOI">10.5072/FK29K4GB3I</identifier>
<creators>
<creator>
<creatorName>EZID</creatorName>
</creator>
</creators>
<titles>
<title xml:lang="en">API Update - Test Study Registration</title>
</titles>
<publisher publisherIdentifier="https://ror.org/03yrm5c26" publisherIdentifierScheme="ROR" schemeURI="https://ror.org/">California Digital Library</publisher>
<publicationYear>2024</publicationYear>
<resourceType resourceTypeGeneral="StudyRegistration">Registered Study</resourceType>
<contributors>
<contributor contributorType="DataCollector">
<contributorName>Lucky</contributorName>
<familyName>The Poodle</familyName>
<affiliation affiliationIdentifier="https://ror.org/03nfnrd41" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org/">Dogs Trust</affiliation>
</contributor>
</contributors>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="IGSN" relationType="IsPartOf">10.58052/IEBWE10AO</relatedIdentifier>
</relatedIdentifiers>
<fundingReferences>
<fundingReference>
<funderName>Wellcome Trust</funderName>
<funderIdentifier funderIdentifierType="ROR">https://ror.org/029chgv08</funderIdentifier>
<awardNumber>23456</awardNumber>
</fundingReference>
</fundingReferences>
</resource>
Production deployment:
preparation:
0006_alter_searchidentifier_searchableresourcetype.py
- Donemerge_bdb_and_dc45
to develop
, then merged develop
to main
(June 25) - Donev3.2.12
and release note: Wed June 26 - Donev3.2.12
on ezid-stg: Wed June 26 - Doneprod deployment: Thursday June 27, 8am
[x] Deploy code
[x] run migration command
[ ] test deployment
Production deployment:
v3.2.13
and release notesv3.2.13
to ezid-stgverify_ezid_after_patching.py
script - passedezid_ui_tests.py
script - passedDeployed v3.2.13 on uc3-ezidui-prd02 first:
python manage.py migrate
at 6:52amDeployed v3.2.13 on uc3-ezidui-prd01 at 6:59am.
Prd02 at 6:50am:
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Exec[ansible-playbook]: Triggered 'refresh' from 2 events
Info: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Exec[ansible-playbook]: Scheduling refresh of Service[ezid]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Service[ezid]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 63.06 seconds
uc3_pupapply.sh: Execution of 'puppet apply' complete. Exit code: 2
uc3puppet@uc3-ezidui-prd02:~> exit
logout
uc3-ezidui-prd02:/home/jjiang>sudo su - ezid
Last login: Wed Jul 10 06:49:58 PDT 2024
ezid@uc3-ezidui-prd02:06:51:28:~$ ps -ef | grep ezid
ezid 1181285 1 0 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid 1181286 1181285 2 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid 1181287 1181285 0 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid 1181288 1181285 0 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid 1181289 1181285 0 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
ezid 1181558 1181285 0 06:50 ? 00:00:00 /usr/sbin/httpd -d /ezid/etc/httpd -f /ezid/etc/httpd/conf/httpd.conf -DNO_DETACH -k start
root 1181614 1175628 0 06:51 pts/0 00:00:00 sudo su - ezid
root 1181616 1181614 0 06:51 pts/1 00:00:00 sudo su - ezid
root 1181617 1181616 0 06:51 pts/1 00:00:00 su - ezid
ezid 1181618 1181617 0 06:51 pts/1 00:00:00 -bash
ezid 1181848 1181618 0 06:51 pts/1 00:00:00 ps -ef
ezid 1181849 1181618 0 06:51 pts/1 00:00:00 grep --color=auto ezid
ezid@uc3-ezidui-prd02:06:51:33:~$ cdlsysctl stop ezid
Failed to stop ezid.service: Access denied
See system logs and 'systemctl status ezid.service' for details.
ezid@uc3-ezidui-prd02:06:51:47:~$ sudo cdlsysctl stop ezid
ezid@uc3-ezidui-prd02:06:51:58:~$ pwd
/ezid
ezid@uc3-ezidui-prd02:06:52:00:~$ cd ezid
ezid@uc3-ezidui-prd02:06:52:03:~/ezid$ python manage.py migrate
System check identified some issues:
WARNINGS:
?: (mysql.W002) MySQL Strict Mode is not set for database connection 'default'
HINT: MySQL's Strict Mode fixes many data integrity problems in MySQL, such as data truncation upon insertion, by escalating warnings into errors. It is strongly recommended you activate it. See: https://docs.djangoproject.com/en/4.2/ref/databases/#mysql-sql-mode
Operations to perform:
Apply all migrations: admin, auth, contenttypes, ezidapp, sessions
Running migrations:
Applying ezidapp.0006_alter_searchidentifier_searchableresourcetype... OK
Prd01 at 6:59am:
Info: Systemd::Daemon_reload[ezid-proc-search-indexer.service]: Scheduling refresh of Service[ezid-proc-search-indexer.service]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-search-indexer]/Systemd::Unit_file[ezid-proc-search-indexer.service]/Service[ezid-proc-search-indexer.service]: Triggered 'refresh' from 2 events
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-stats]/Systemd::Unit_file[ezid-proc-stats.service]/Systemd::Daemon_reload[ezid-proc-stats.service]/Exec[systemd-ezid-proc-stats.service-systemctl-daemon-reload]: Triggered 'refresh' from 1 event
Info: Systemd::Daemon_reload[ezid-proc-stats.service]: Scheduling refresh of Service[ezid-proc-stats.service]
Notice: /Stage[main]/Uc3_ezid_ui/Uc3_ezid_ui::Config[production]/Uc3_ezid_ui::Background_job[proc-stats]/Systemd::Unit_file[ezid-proc-stats.service]/Service[ezid-proc-stats.service]: Triggered 'refresh' from 2 events
Notice: Applied catalog in 97.32 seconds
uc3_pupapply.sh: Execution of 'puppet apply' complete. Exit code: 2
uc3puppet@uc3-ezidui-prd01:~> exit
logout
uc3-ezidui-prd01:/home/jjiang>sudo su - ezid
Last login: Wed Jul 10 06:57:12 PDT 2024
ezid@uc3-ezidui-prd01:06:59:03:~$ cd ezid-ops-scripts/scripts/
ezid@uc3-ezidui-prd01:06:59:09:~/ezid-ops-scripts/scripts$ python verify_ezid_after_patching.py -e prd
ok 1.1 - Verify EZID status
info 1.2 - EZID version - v3.2.13
ok 2 - Verify search function
## Create identifier
ok 3.1 - doi:10.15697/FK2T616 created
ok 3.2 - doi:10.15697/FK2PC91 created
ok 3.3 - ark:/99999/fk47q0nr3s created
ok 3.4 - doi:10.5072/FK22R40T38 created
ok 3.5 - doi:10.5072/FK2Z03956W created
ok 3.6 - ark:/99999/fk43x9xx80 created
ok 3.7 - doi:10.5072/FK2T72KC3P created
ok 3.8 - ark:/99999/fk4059742f created
## Update identifier
ok 4 - ark:/99999/fk4059742f updated with new data: b'_target: https://cdlib.org/services/\n'
## Check background job status
ok 5.1 - ezid-proc-binder active running
ok 5.2 - ezid-proc-cleanup-async-queues active running
error 5.3 - ezid-proc-celery is not running
ok 5.4 - ezid-proc-crossref active running
ok 5.5 - ezid-proc-datacite active running
ok 5.6 - ezid-proc-download active running
ok 5.7 - ezid-proc-expunge active running
ok 5.8 - ezid-proc-newsfeed active running
ok 5.9 - ezid-proc-search-indexer active running
ok 5.10 - ezid-proc-stats active running
ok 5.11 - ezid-proc-link-checker active running
ok 5.12 - ezid-proc-link-checker-update active running
## Check batch download from S3
waiting for file to become available: 5s passed
waiting for file to become available: 10s passed
waiting for file to become available: 15s passed
waiting for file to become available: 20s passed
waiting for file to become available: 25s passed
ok 6 - batch download file is available at: https://ezid.cdlib.org/s3_download/t2OJoO013P4iacBh.csv.gz
Post-deployment tests:
verify_ezid_after_patching.py
script - passedezid_ui_tests.py
script - passedSeeing RDS memory decrease starting from 6:15 this morning. Available memory Capacity dropped from 1K to 137MB and remaining low in the 100MB range. It is probably not deployment related. I will create a ticket for this.
IAS ticket: https://github.com/cdlib/cdlsys/issues/538
When DataCite's schema 4.5 is public, we need to upgrade EZID to support DOIs registered with this version of the schema. Ideally this will be done as soon as possible following the public release of the new version, so that users can take advantage of the new metadata fields available, and so that EZID does not fall behind in its support for DataCite DOIs.
Requirements:
Changes in version 4.5 that need to be reflected in EZID are as follows:
Add "Instrument" as an allowed resource type in the resourceTypeGeneral field. In the UI dropdown, this needs to be added to the dropdown of resource types (insert it after "Image" and before "Interactive Resource"
Add "StudyRegistration" as an allowed resource type in the resourceTypeGeneral field. In the UI dropdown, this should be written with a space ("Study Registration") and inserted between "Standard" and "Text"
Add two new relation types in the relationType field: "IsCollectedBy" and "Collects". In the UI dropdown, insert "Collects" between "Cites" and "Compiles." Insert "Is Collected By" between "Is Cited By" and "Is Collected By"
Add additional subfields to the Publisher element: (1) publisherIdentifier; (2) publisherIdentifierScheme; (3) schemeURI. In the UI, these should be new fields similar to those created for affiliation identifiers. These subfields are not mandatory but they must be used as a group. In other words, if a publisher identifier is provided, a scheme is also mandatory.
Full schema documentation:
Notes from DataCite:
Key milestones
Tests
Test documentation
Deployment plan