CDLUC3 / ezid-service

4 stars 0 forks source link

Restructure DataCite datacenters (repositories) #256

Open mariagould opened 6 months ago

mariagould commented 6 months ago

EZID DataCite shoulders are associated with a "datacenter" code (e.g., CDL.UCB), which is a requirement for registering identifiers with DataCite. DataCite currently refers to these datacenters as "repositories" and they are used in DataCite APIs and other services to identify DOIs associated with a particular repository, group, project, or organization.

DataCite's notion of datacenter/repository has gone through multiple iterations. Historically, it used to be more common for an organization like CDL to have a high number of datacenters, typically associated with an individual user account and not corresponding to any actual repository structures or groupings. When DataCite made an adjustment to its fee model circa 2019 that based pricing primarily on datacenters, CDL consolidated its datacenters and rolled them up to the campus level (11 total including CDL) to normalize our approach and rationalize the payment structure for our DataCite membership. In 2020/2021, the fee model changed again and CDL became a consortium. As part of this change, DataCite evolved the notion of datacenter into the current repository model, and pricing was no longer determined by repositories.

On the EZID side, we are still operating on a one repository per campus model. This means that DOIs registered by individual repositories are not associated with a specific repository ID in DataCite's APIs and other services, which can inhibit some users' ability to track and identify research outputs. (Possible workaround would be searching on a publisher field, affiliation, or prefix.)

EZID has yet to "un-consolidate" its datacenters and align with DataCite's current repository-based structure for a few reasons: (1) the list of datacenters is hard-coded in EZID and can't be changed without significant development work, (2) the implications of losing the campus-based identifier need to be fully understood and explored, and (3) all existing DataCite user accounts in EZID need to be reviewed and investigated to determine how they would map to a new repository structure and the appropriate level of granularity (for example, should there be a single repository for NCEAS, or multiple repositories for different types of repositories/projects that NCEAS is involved in). This work is not trivial and there has not been sufficient bandwidth to take it on in recent years without undermining other work.

To summarize the state of affairs:

More granular datacenter is useful for datacite commons, other tracking DataCite can help move DOIs when we’re ready

Key steps to move this forward

Potential questions to investigate

Notes from October 2022 conversation with DataCite