NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

identifiers vs alternateIdentifiers appear disconnected from current DataCite schema #303

Closed jordanpadams closed 2 years ago

jordanpadams commented 2 years ago

🐛 Describe the bug

It looks like we are using identifiers in our schema, which I think should be alternateIdentifiers, and I do not think we are quite using relatedIdentifiers and alternateIdentifiers correctly.

📜 To Reproduce

Steps to reproduce the behavior: Per this PDS4 DOI metadata 10.17189/d3nm-pp09, we have:

  "identifiers": [
    {
      "identifier": "urn:nasa:pds:mars2020_navcam_ops_raw::1.0",
      "identifierType": "URN"
    }
  ]
  ...
  "relatedIdentifiers": [
    {
      "relationType": "IsIdenticalTo",
      "relatedIdentifier": "urn:nasa:pds:mars2020_navcam_ops_raw::1.0",
      "relatedIdentifierType": "URN"
    }
  ],

But per the DataCite schema, there is no identifiers object? I think we want this to be `alternateIdentifiers.

Additionally, from the description of relatedIdentifiers, I think these are supposed to be related products, e.g. other DOIs or URNs to other systems (or even the PDS system) but not to other identifiers. I think alternateIdentifiers is the place for these. Additionally, the alternateIdentifierType is a free text field. We can actually specify what we want there. Looks like OSTI used Site ID, which we can continue to use, or we can switch to using our PDS nomenclature.

🕵️ Expected behavior

Here are some example changes.

Use a recent DOI minted through DataCite 10.17189/d3nm-pp09 Before:

  "identifiers": [
    {
      "identifier": "urn:nasa:pds:mars2020_navcam_ops_raw::1.0",
      "identifierType": "URN"
    }
  ]
  ...
  "relatedIdentifiers": [
    {
      "relationType": "IsIdenticalTo",
      "relatedIdentifier": "urn:nasa:pds:mars2020_navcam_ops_raw::1.0",
      "relatedIdentifierType": "URN"
    }
  ],

After:

  "alternateIdentifiers": [
    {
      "alternateIdentifier": "urn:nasa:pds:mars2020_navcam_ops_raw::1.0",
      "alternateIdentifierType": "Site ID"
    }
  ]

Use a PDS3 DOI 10.17189/1519454 Before:

  "identifiers": [
    {
      "identifier": "1519454",
      "identifierType": "IAD ID"
    },
    {
      "identifier": "A14A-L-CCIG-3-ATMOS-DENSITY-PLOTS-V1.0",
      "identifierType": "Site ID"
    }
  ],
  "relatedIdentifiers": [
    {
      "relationType": "IsIdenticalTo",
      "relatedIdentifier": "urn:nasa:pds:context_pds3:data_set:data_set.a14a-l-ccig-3-atmos-density-plots-v1.0",
      "relatedIdentifierType": "URN"
    }
  ],

After (we can keep the IAD ID as well, but we have synonymous alternate local identifiers for this product):

  "alternateIdentifiers": [
    {
      "alternateIdentifier": "1519454",
      "alternateIdentifierType": "IAD ID"
    },
    {
      "alternateIdentifier": "A14A-L-CCIG-3-ATMOS-DENSITY-PLOTS-V1.0",
      "alternateIdentifierType": "Site ID"
    },
    {
      "alternateIdentifier": "urn:nasa:pds:context_pds3:data_set:data_set.a14a-l-ccig-3-atmos-density-plots-v1.0",
      "alternateIdentifierType": "Site ID"
    }
  ],

📚 Version of Software Used

🩺 Test Data / Additional context

🏞Screenshots

🖥 System Info


🦄 Related requirements

⚙️ Engineering Details

Here are the actions I think we need to take:

nutjob4life commented 2 years ago

Sort of related to #294; this is the last in a series of related tasks.

For every single DOI records we're attempting to be quite deliberate and making sure everything is set right and correct on first submission—but we have a lot of records. About ⅔ way through. Hoping to finish before break next week! 🎄

collinss-jpl commented 2 years ago

@jordanpadams Just a heads up, after trying some submissions directly to DataCite via curl, it seems as though identifiers and alternateIdentifiers are treated by DataCite as aliases for one-another. For example, if I send an update request to empty identifiers, anything in alternateIdentifiers is also emptied. Likewise, if I make an update that only provides values for alternateIdentifiers, both it and identifiers are updated. This isn't a blocker, but I wanted you to be aware of this the next time you're spot checking records and see that both identifiers and alternateIdentifiers contain duplicate lists of our LIDVID or PDS3 ids.

jordanpadams commented 2 years ago

@collinss-jpl copy that! very interesting and odd

tloubrieu-jpl commented 2 years ago

The updates of DOI record have been made on pds-prod1 so I am closing the ticket