DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Persistent Identifiers for Organizations and People #1

Closed leightonlc closed 8 months ago

leightonlc commented 1 year ago

Leighton Christiansen National Transportation Library, USDOT

Requirement(s)

Metadata fields that record information about Organizations and People should also record one or more persistent identifiers for that entity. publisher: should include subfields for "identifier" and "identifierType" where persistent identifiers such as SAM numbers, Crossref ids, and/or Research Organization Registry (ROR) ids can be recorded, with the type specified. contactPoint: Besides "email", contactPoint should include subfields for "identifier" and "identifierType" where persistent identifiers such as ORCIDs, ResearcherID, and/or arXiv Author Identifiers can be recorded, with the type specified.

Examples: "publisher": { "organizationIdentifier": { "https://ror.org/02xfw2e90", "identifierType": {"ROR", } } }

"contactPoint": { "personalIdentifier": {"https://orcid.org/0000-0002-0543-4268", "identifierType": {"ORCID", } } }

Problem Statement

The use of persistent identifiers to unambiguously identify researchers and research-related organizations is a standard practice in publishing and repositories. People may change their name several times over a lifetime, or variants of their name may be used by different publishers, causing confusion about a researcher's lifetime of work. Further, many people have the same names. Persistent identifiers, such as ORCID, help to globally disambiguate researchers. Organizations may also go through name changes, especially after internal reorganizations, or rebranding to better express mission, or to use more inclusive language. But their mission, and role vis a vie research, many not change. Further, sometimes researchers do not use the proper or preferred name or acronyms for a funding agency or publisher when citing support or publication history. Unique identifiers help to identify a specific organization through its lifecycle, and perhaps disambiguate it from an international or regional organization with the same or similar name. The use of persistent identifiers is specifically called out by the FAIR Principles https://www.go-fair.org/fair-principles/ as the first three steps of Findability. Further, the implementation of digital persistent identifiers is required by National Security Presidential Memorandum 33 (NSPM-33) and explained in the guidance document for NSPM-33 at https://www.whitehouse.gov/wp-content/uploads/2022/01/010422-NSPM-33-Implementation-Guidance.pdf Expanding the use of persistent identifiers will help to bring DCAT-US in closer alignment with standard practice and new federal policies.

Target Audience / Stakeholders

Researchers Repository managers Research Funders Data consumers Metadata experts

Intended Uses / Use Cases

In DCAT-US 3

"publisher": { "organizationIdentifier": { "https://ror.org/02xfw2e90", "identifierType": {"ROR", } } }

"contactPoint": { "personalIdentifier": {"https://orcid.org/0000-0002-0543-4268", "identifierType": {"ORCID", } } }

Existing Approaches - Optional

Left blank intentionally

Additional context, comments, or links - Optional

Left blank intentionally

torrin47 commented 1 year ago

Yes, and...

contactPoint is implemented in the DCAT-US schema by means of the vCard specification, which has an optional UID element that would be a good fit for ORCID and easy to implement. Documented here: https://github.com/USEPA/EPA_Environmental_Dataset_Gateway/issues/19 and here: https://github.com/project-open-data/project-open-data.github.io/issues/614

publisher would need more work to accommodate RORs, but agree fully with the value proposition.

Other important persistent identifiers, such as DOIs and PMCIDs supporting linked open data principles, are often dumped into the "references" array with no additional context. Suggest that any URI or other persistent identifier in DCAT-US be allowed to include an associated human-readable description of what the URI represents and potentially a reference to an issuing authority as described here: https://www.w3.org/TR/vocab-dcat-3/#identifiers-type Lots more discussion on this topic over here: https://github.com/project-open-data/project-open-data.github.io/issues/592 https://github.com/project-open-data/project-open-data.github.io/issues/69

fellahst commented 8 months ago

Leighton,

Thank you for highlighting the importance of incorporating persistent identifiers for organizations and individuals in the DCAT-US schema, a practice aligned with FAIR principles and NSPM-33 guidelines. Your proposal to include specific subfields for "identifier" and "identifierType" under 'publisher' and 'contactPoint' is insightful and addresses the need for unambiguous identification in digital publishing and repositories.

The FAIR principles advocate for globally unique and persistent identifiers (F1) and their retrievability through standardized protocols (A1). To this end, generating resolvable URLs in compliance with the RFC 3986 IETF standard and Linked Data best practices is crucial. This includes elements like scheme, authority, path, and local or globally unique identifiers.

URI resolution services like purl.org, w3ids, doi.org, orcid.org, arxiv.org, and Identifiers.org play a vital role in ensuring consistent access to resources, emphasizing the need for persistent identifiers in data management.

For the 'publisher' field, your suggestion to incorporate identifiers such as SAM numbers, Crossref ids, and ROR ids aligns with these principles. Similarly, for 'contactPoint', including identifiers like ORCIDs or ResearcherID enhances the precision in identifying researchers and organizations.

Implementing these changes will not only improve the discoverability and interoperability of datasets but also bring DCAT-US in closer alignment with standard practices and federal policies.

We have addressed your requirements in the current DCAT-US 3.0 specification in a number of ways:

Your contribution is valuable to our ongoing efforts to enhance data management practices, and we look forward to getting your feedback on how we addressed your requirements in the DCAT-US 3.0 application profile.

ShaferAC commented 8 months ago

+1

mrratcliffe commented 8 months ago

+1; ORCID 0000-0002-2458-4675 agrees with this ;-)

leightonlc commented 8 months ago

Thank you for the thoughtful integration of my comments. I am looking forward to implementing them. Happy New Year! Leighton (Preferred pronouns: They/Them. Preferred title: Mx. Thank you!) If you need to talk to me, please call 202-578-0185 You may also chat or video conference with me on Teams. Remote Office Hours: 0600 to 1630 Monday through Thursday, most days (OOO on Fridays)

From: fellahst @.> Sent: Wednesday, December 13, 2023 3:22 PM To: DOI-DO/dcat-us @.> Cc: Christiansen, Leighton (OST) @.>; Author @.> Subject: Re: [DOI-DO/dcat-us] Persistent Identifiers for Organizations and People (Issue #1)

CAUTION: This email originated from outside of the Department of Transportation (DOT). Do not click on links or open attachments unless you recognize the sender and know the content is safe.

Leighton,

Thank you for highlighting the importance of incorporating persistent identifiers for organizations and individuals in the DCAT-US schema, a practice aligned with FAIR principles and NSPM-33 guidelines. Your proposal to include specific subfields for "identifier" and "identifierType" under 'publisher' and 'contactPoint' is insightful and addresses the need for unambiguous identification in digital publishing and repositories.

The FAIR principles advocate for globally unique and persistent identifiers (F1) and their retrievability through standardized protocols (A1). To this end, generating resolvable URLs in compliance with the RFC 3986 IETF standard and Linked Data best practices is crucial. This includes elements like scheme, authority, path, and local or globally unique identifiers.

URI resolution services like purl.org, w3ids, doi.org, orcid.org, arxiv.org, and Identifiers.org play a vital role in ensuring consistent access to resources, emphasizing the need for persistent identifiers in data management.

For the 'publisher' field, your suggestion to incorporate identifiers such as SAM numbers, Crossref ids, and ROR ids aligns with these principles. Similarly, for 'contactPoint', including identifiers like ORCIDs or ResearcherID enhances the precision in identifying researchers and organizations.

Implementing these changes will not only improve the discoverability and interoperability of datasets but also bring DCAT-US in closer alignment with standard practices and federal policies.

We have addressed your requirements in the current DCAT-US 3.0 specification in a number of ways:

Your contribution is valuable to our ongoing efforts to enhance data management practices, and we look forward to getting your feedback on how we addressed your requirements in the DCAT-US 3.0 application profile.

- Reply to this email directly, view it on GitHubhttps://github.com/DOI-DO/dcat-us/issues/1#issuecomment-1854647949, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMIQPCWB5S7HKZNWABL7IY3YJIE73AVCNFSM6AAAAAAYDSKFVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJUGY2DOOJUHE. You are receiving this because you authored the thread.Message ID: @.**@.>>