Open only1chunts opened 4 years ago
When this has been done we also need a script to run over the database and update things in DataCite to the new XML so that we capture as many links as possible in DataCite. Perhaps not ALL datasets, maybe limit it to those with a release date of 2019 or newer.
@only1chunts says the mint DOI button may use Jesse's GigaDB API to generate the datacite XML and then submits that to Datacite via their API. I cant think of any reason it would need to update anything within the GigaDB database, its just taking details from the database and passing them to DataCite.
As an author I want my dataset to be in DataCite So that it can be automatically propagated to ORCID
I found an issue that I think you will want to address -- it looks like the names are registered incorrectly in the DataCite database, where the DOI is registered. Specifically, preferred names and given names are reversed. I believe this is an error in the way GigaScience is registering citation information in general, since I can see this goes back to an earlier publication of mine as well from a few years ago. I've tried to explain the problem in detail here, so you can correct it. It will probably require going back and re-updating the registered metadata for these DOIs going in the past.
Here are the details:
When I hit the crossref API with this DOI you provide below, I am told that this DOI is registered by datacite:
http://api.crossref.org/works/10.5524/100936/agency
{
"status": "ok",
"message-type": "work-agency",
"message-version": "1.0.0",
"message": {
"DOI": "10.5524/100936",
"agency": {
"id": "datacite",
"label": "DataCite"
}
}
}
When I go to datacite and try to pull the metadata with their API, the given names and surnames are specified incorrectly:
https://api.datacite.org/dois/10.5524/100936
{
"data": {
"id": "10.5524/100936",
"type": "dois",
"attributes": {
"doi": "10.5524/100936",
"prefix": "10.5524",
"suffix": "100936",
"identifiers": [],
"alternateIdentifiers": [],
"creators": [
{
"name": "Nathan, Sheffield C.",
"nameType": "Personal",
"givenName": "Sheffield C.",
"familyName": "Nathan",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0001-5643-4068",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Michał, Stolarczyk",
"nameType": "Personal",
"givenName": "Stolarczyk",
"familyName": "Michał",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0003-2101-9061",
"nameIdentifierScheme": "ORCID"
}
]
},
... Notice it says my givenName is "Sheffield C." and my familyName is "Nathan". Obviously, it should be that my family name is "Sheffield" and my givenName is "Nathan C.". I discovered this because when I try to use an automated citation importer in JabRef, so I could cite this in my gigascience manuscript, the reference is populated incorrectly:
This should say: Sheffield2021, with "Sheffield, NC", etc. The metadata importer is actually correct -- it's that the names are annotated wrongly in the database. Just to double-check, I tried this for an earlier GigaSicnece publication, and I've realized this is also true of an earlier GigaScience publication of mine from a few years ago:
https://api.datacite.org/dois/10.5524/100670
{
"data": {
"id": "10.5524/100670",
"type": "dois",
"attributes": {
"doi": "10.5524/100670",
"prefix": "10.5524",
"suffix": "100670",
"identifiers": [],
"alternateIdentifiers": [],
"creators": [
{
"name": "Jason, Smith P.",
"nameType": "Personal",
"givenName": "Smith P.",
"familyName": "Jason",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0002-2688-0988",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Michał, Stolarczyk",
"nameType": "Personal",
"givenName": "Stolarczyk",
"familyName": "Michał",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0003-2101-9061",
"nameIdentifierScheme": "ORCID"
}
]
},
{
"name": "Nathan, Sheffield C.",
"nameType": "Personal",
"givenName": "Sheffield C.",
"familyName": "Nathan",
"affiliation": [],
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0001-5643-4068",
"nameIdentifierScheme": "ORCID"
}
]
},
I believe you have a systematic error in the way you are populating the datacite metadata. Do you think you can correct this? Thanks,
Nathan
related (but not dependent on) to #115
Datacite JSON REST API DOCs: https://support.datacite.org/docs/api
Sample input in JSON format: https://support.datacite.org/docs/api-create-dois
JSON equivalent of the relatedIdentifier
element:
{
"data": {
"attributes": {
"relatedIdentifiers": [
{
"relatedIdentifier": "https://doi.org/10.xxxx/xxxxx",
"relatedIdentifierType": "DOI",
"relationType": "References",
"resourceTypeGeneral": "Dataset"
}
]
}
}
}
https://support.datacite.org/docs/updating-metadata-with-the-rest-api
NB- The DataCite Schema v3.5 (which we currently use) will be deprecated at the end of 2024, so we MUST get this update done before then.
maybe of interest - here is a presentation by DataCite staff describing the differences in schema v4.5 and the deprecation of v3 schema. I believe we are actually using v4.0 (which was released in 2016) https://youtu.be/i_4Uf_VB5Rw
(in progress, will update comment here when its complete) Mapping of DataCite schema v4.5 to GigaDB schema terms https://docs.google.com/spreadsheets/d/18x5l8GU8FNV3Og_RCF_Ei142tSbLv45TZN4Tfqko6sk/edit#gid=1158026556
Can I add a vote/appeal to prioritise updating the metadata as soon as possible, as it should be a very easy and simple update? We need to strike while the iron is hot and if we drop the ball and wait too long to address this the metadata here will get out-of-date again, we'll then want to update again probably waiting for another opportunity to do this, and it will drag on further and waste getting a funded internship in this summer
The updated metadata is here: https://github.com/Jeffrey-yu-hc/GIGADB-MAPPING/blob/main/gigadb_mapping_coding(final_edition).py
The link @ScottBGI provided is the code writen by the intern. Its designed to run in Google colab as an iPython notebook. In theory it can create DataCite schema 4.5 compatible files for each dataset in GigaDB. He did not get as far as validating any of the output. On visual inspection it appears to be good, and the mapping of GigaDB elements to the correct DataCite elements appears correct.
We have just had another random user point out the issue on the GigaDB "Cite Dataset" button where it messes up the authors because DataCite mess up the authors when we pass them the details at present, so getting the schema we use updated will fix that issue as long as we update all existing datacite dataset details.
FYI - the code ran by Jeffery is also in colab notebook here: https://colab.research.google.com/drive/1kkf36UdU4BIdP_bmBme4swETCNpZy_uo
@only1chunts in Author model, should the surname and first name be required?
@only1chunts I'm a bit confused about the part concerning the relation and the external links. they don't seem to be related.
@alli83 maybe its easier to discuss the issues in person as I'm not quite sure what you're asking and I dont want to confuse things further by giving the wrong details. I can hope on a call today or any day this week, or if its not urgent, we can wait until Mondays sprint catchup?
@only1chunts I discussed this issue with @rija . But we can wait until the Monday sprint catchup to confirm the approach.
@only1chunts I think Github is not a valid relatedIdentifierType. https://github.com/Jeffrey-yu-hc/GIGADB-MAPPING/blob/main/gigadb_mapping_coding(final_edition).py https://schema.datacite.org/meta/kernel-4.0/doc/DataCite-MetadataKernel_v4.0.pdf
Good spot, you are correct the value "GitHub" is not a valid relatedIdentifierType, we use the value "URL" for the GitHub links instead. Thanks.
Good spot, you are correct the value "GitHub" is not a valid relatedIdentifierType, we use the value "URL" for the GitHub links instead. Thanks.
Thanks I update then. Also, do you have a specific DOI in mind, that I could test on?
as in a relatedIdentifier that is a DOI? - dataset 100352 includes this DOI as an external link: http://doi.org/10.17605/OSF.IO/6RTWS will that do?
Sorry, I mean to test the XML
User Story
Acceptance Criteria