Closed jsjiang closed 1 year ago
Related issue https://github.com/CDLUC3/ezid/issues/378
Procedure
uc3-ezidui02x2-stg:/home/jjiang/ezid/fix_datacite_xml>cat select_datacite_metadata.sql returns 40K+
select id, identifier, owner_id, ownergroup_id, metadata,
TRIM(BOTH '"' FROM JSON_EXTRACT(metadata, "$.datacite")) as datacite_xml
from ezidapp_identifier where identifier like 'doi%'
and metadata like '%kernel-\%s%';
Output file: c3-ezidui02x2-stg:/home/jjiang/ezid/fix_datacite_xml/data_files>wc -l datacite_records_to_fix_prd.tsv 40295 datacite_records_to_fix_prd.tsv
def update_datacite_xml(id, datacite_xml, base_url, passwd): url = f"{base_url}/id/{id}"
#datacite = f"datacite: {datacite_xml}"
headers = {
"Content-Type": "text/plain; charset=UTF-8",
"Authorization": "Basic " + base64.b64encode(f"admin:{passwd}".encode('utf-8')).decode('utf-8'),
}
try:
#r = requests.post(url=url, data=datacite, headers=headers)
# no need to send data for this metadata fix;
# the to be fixed data element is in the resource tag which is created by datacite.py
r = requests.post(url=url, headers=headers)
#r.raise_for_status()
return r.text
except Exception as e:
print(e)
Last 5 records: success: doi:10.7941/D1TP7N - Reserved - Updated in EZID - not showing on Datacite success: doi:10.7941/D1V63M - Public - Updated in EZID - showing correctly on Datacite success: doi:10.7941/D1WK8M - public - Updated in EZID - showing correctly on Datacite success: doi:10.7941/D1ZD0G - reserved - Updated in EZID - not showing on Datacite success: doi:10.7941/D1ZS7X - public - Updated in EZID - showing correctly on Datacite
Note:
Note: the ezid-client-tools/batch-register3.py script can be used to reprocess the records without updating existing metadata.
Over 40K DataCite records contain invalid xml metadata due to a program bug. Develop a process to batch fix the metadata in EZID and DataCite systems.