DOIs for CF Convention releases?

rsignell-usgs commented 6 years ago

Seems like getting a new DOI for each release of CF would be a good idea.

And getting a DOI is pretty easy for GitHub releases: https://guides.github.com/activities/citable-code/

What do folks think?

In March 2023 the CF governance panel decided to use Zenodo fo CF DOIs, as reported by Ethan @ethanrd. After the annual meeting in September 2023, Gui @castelao prepared pull request 443 to support CF's adoption of GitHub/Zenodo integration.

davidhassell commented 6 years ago

An excellent idea, I think.

cf-metadata-list commented 6 years ago

Makes a lot of sense to me.

Cheers, Roy.

Please note that I partially retired on 01/11/2015. I am now only working 7.5 hours a week and can only guarantee e-mail response on Wednesdays, my day in the office. All vocabulary queries should be sent to enquiries@bodc.ac.uk. Please also use this e-mail if your requirement is urgent.

From: owner-cf-metadata@listserv.llnl.gov owner-cf-metadata@listserv.llnl.gov on behalf of David Hassell notifications@github.com Sent: 18 January 2018 19:28 To: cf-convention/cf-conventions Cc: Subscribed Subject: Re: [cf-convention/cf-conventions] DOIs for CF Convention releases? (#127)

An excellent idea, I think.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cf-convention/cf-conventions/issues/127#issuecomment-358753375, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AfI2geMh29WgZpi4h9niedoqGHr574-yks5tL5mcgaJpZM4RjadX.

This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.

ethanrd commented 6 years ago

Another option is to have a single DOI and recommend that users include the version number when citing CF.

What URL should result when dereferencing a CF DOI? I would think either the main CF web page or the current CF specification document.

neumannd commented 6 years ago

It sounds like a good idea to assign DOIs for the cv convention documents. The content, to which a DOI points, has to be invariable. Therefore, a DOI can only be assigned to a particular version of the cf convention document and not to the cf conventions in general.

davidhassell commented 6 years ago

I know that on some DOI services (e.g. [https://zenodo.org/]()) you can have a unique DOI for each release, but also generic DOI that always resolves to the latest version. I don't know if this feature is ubiquitous, though.

For instance, [https://doi.org/10.5281/zenodo.832255]() resolves to the latest version of cf-python, whatever it may be. Right now it's v2.1, and v2.1 has it's own DOI [https://zenodo.org/record/1039367]()

ethanrd commented 6 years ago

The DOI itself is permanent, the URL that results from dereferencing the DOI can be changed. The object/concept the DOI identifies should be permanent. What that object/concept actually represents and the possible versioning of that object, I believe, is up to those stewarding that object.

DataCite [1] is the DOI minting service I've used. Their metadata schema [2] includes a field for version information. There are some notes on versioning on page 28 of the "DataCite Metadata Schema Documentation for the Publication and Citation of Research Data" [3] including:

Suggested practice: track major_version.minor_version.

Register a new identifier for a major version change. Individual stewards need to determine which are major vs. minor versions2

Not sure what other DOI minting services recommend or how this might work if using the GitHub DOI minting tie-in with FigShare.

[1] https://www.datacite.org

[2] http://doi.org/10.5438/0014

[3] https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf

ethanrd commented 6 years ago

Yes, I kind of like the idea of having a top-level DOI and one for each version. Though, more DOIs means more things to maintain and more DOIs to include when tracking citations.

With a top-level DOI and individual version DOIs, what would be the recommended citation? Including the version information in the citation is more transparent (at least to the human eye).

The DataCite metadata schema includes a relationType property that can be used to give a relationship with another DOI-ed resource. It has a controlled list of values that includes HasVersion and IsVersionOf. So, perhaps defining these relationships will help ameliorate some of the issues around multiple DOIs.

neumannd commented 6 years ago

OK, thanks for the clarification. I wasn't aware of that possibiliy.

rsignell-usgs commented 6 years ago

@davidhassell would you be willing to make this happen?

graybeal commented 6 years ago

OK, looks like I'll be the odd one out here. Let me ask a few questions:

What will the DOI(s) be used for that the canonical URLs can not?
What capability do the DOIs have that the canonical URLs do not?
How will you resolve the duality of two canonical references, one being the DOI and the other being the canonical URL?
How will the DOIs representing different versions be recognizably different versions of the same entity/publication?
How will the DOIs be recognizably associated with the CF conventions, without having to actually resolve them? (This, at least, there is a known answer to, just want to be sure we are leveraging it.)

I know the community likes DOIs, but I'm not convinced there is any analytical advantage to the function provided by the DOIs.

dopplershift commented 6 years ago

I completely reject the idea that a URL on the internet is a suitable fixed point of reference. The "canonical URL" for the CF-conventions has changed over time, rendering unusable any publication citation that relied upon that.

DOIs provide a fixed record suitable for citation that is capable of being updated to point to new "landing pages" for the same content.

dblodgett-usgs commented 6 years ago

Assuming someone maintains the mapping between DOI and the intended digital object's current URL.

Otherwise, DOIs become stale unique strings the same as URLs do.

I said I'd stay out of the persistent identifier flame war, but I failed. Maybe we should use blockchain.

On Jan 19, 2018, at 11:58 AM, Ryan May notifications@github.com wrote:

I completely reject the idea that a URL on the internet is a suitable fixed point of reference. The "canonical URL" for the CF-conventions has changed over time, rendering unusable any publication citation that relied upon that.

DOIs provide a fixed record suitable for citation that is capable of being updated to point to new "landing pages" for the same content.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cf-convention/cf-conventions/issues/127#issuecomment-359042523, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbHQxMdhPyMjQpwEsqD-hfJECeDFzirks5tMNfGgaJpZM4RjadX.

dopplershift commented 6 years ago

Sure, everything digital needs upkeep--that's the blessing and the curse.

It's not my area of expertise, so I'm not really qualified to debate this with an informed point of view. therefore when it comes to best practice for long term reference and archival, I'll trust what the experts (i.e. digital library people) tell me to do: DOIs.

graybeal commented 6 years ago

The only reason canonical UIs have to change is that they have been chosen and managed without regard to their final purpose. (Something that DOIs are also vulnerable to, though I agree not as commonly.) Put me in the Cool URIs Don't Change camp.

cf-metadata-list commented 6 years ago

DOIs were designed to decouple content (CF conventions) from the particulars of how and where it's served. In an ideal world URLs wouldn't change, but we all know they do. It's much easier to update the location a DOI resolves to than to set up forwarding from stable URLs on webservers that you may or may not have access to, etc...

On Fri, Jan 19, 2018 at 10:08 AM John Graybeal notifications@github.com wrote:

The only reason canonical UIs have to change is that they have been chosen and managed without regard to their final purpose. (Something that DOIs are also vulnerable to, though I agree not as commonly.) Put me in the Cool URIs Don't Change camp.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cf-convention/cf-conventions/issues/127#issuecomment-359045064, or mute the thread https://github.com/notifications/unsubscribe-auth/AfI2gVK8VOCVN1QQtb_eVNmh7sdnTuKSks5tMNoAgaJpZM4RjadX .

davidhassell commented 6 years ago

I am happy to make something happen!

The DOI server would, I think, keep a copy of the versioned document(s), thereby decoupling the need for a stable URL.

graybeal commented 6 years ago

TLDR version: I will not object further nor complain if you go the DOI path (except occasionally with a wink and nudge to close colleagues). Thanks for listening to my input!

I just have a few followups, to fully explain my perspective.

I am not aware of DOI servers being used to archive content. In fact not sure how they would know what to archive, given they just point to another resource, which could have arbitrarily many links to its parts (if the document is maintained as a set of pages, for example). I'm interested to know more.

I accept the judgment of the library community that DOIs are perfect unique identifiers for bibliographic materials, that is their clear community choice. On the other hand, the expert librarians I talk to at Stanford are open to the possibility that DOIs are not the primary references for certain other kinds of digital content. The kind of content where I am most experienced is semantic content, where IRIs are the typical (but not universal) identifier of choice, because of the W3C semantic standards. So, in short, I think one identifier type does not fit all needs.

I accept that DOIs were designed to decouple content; they were poorly designed to resolve content, without knowing what to add to them to make them resolvable. That said, you can generally find a DOI with Google, and yes, DOIs are easy(ier) to re-point by design.

I also concede that the DOI infrastructure is well-enough funded (and consistently-enough-used for this kind of thing) that the DOI infrastructure will not cause as many long-term headaches as most IRIs will. So I will not be trying to argue further, but I do want to note:

updating the DOI requires authority to update the DOI
over time, that authority must be passed on to others in an organized way, ideally through organizational accounts and permissions
if you have not properly prepared your organization for managing the DOIs, you will not be able to update the DOIs without at least some pain and suffering (the more rigorously DOI servers care about transitioning ownership, the more pain and suffering you'll face—since you don't want people stealing your DOI maintenance role from you)
you remain at the mercy of the company managing the DOI, and the services they provide.

These realities seem to map one-to-one with the realities of creating IRIs to decouple the content from the particulars of how and where it's served (I recommend Tim Berners-Lee's Cool URIs document, it's a short read and a fun bit of history). Either way, to have a successful persistent identifier, you have to be thoughtful, you have to invest resources in managing the maintenance and succession processes, and you have to understand that this is an indirection service that is run by an organization, one which you may or may not have full control over for the (eternal!) life of the identifier. If you manage those issues, either technology is equally effective, with only minor differences in cost-per-identifier and user pain to resolve the identifier.

neumannd commented 6 years ago

I just realized that netCDF also has its own DOI as mentioned here:

https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#How-should-I-cite-use-of-netCDF-software

It is written (if the URL does not work at some point in the future):

The registered Digital Object Identifier for all versions of netCDF software is http://doi.org/10.5065/D6H70CW6.

The following can be used as a citation:

Unidata, (year): Network Common Data Form (netCDF) version nc_version [software]. Boulder, CO: UCAR/Unidata. (http://doi.org/10.5065/D6H70CW6)

where year is the year in which the work being described was done and nc_version is the version of netCDF used. For example:

Unidata, (2015): Network Common Data Form (netCDF) version 4.3.3.1 [software]. Boulder, CO: UCAR/Unidata. (http://doi.org/10.5065/D6H70CW6)

dblodgett-usgs commented 6 years ago

Was there a conclusion to this issue? Is someone going to move it forward?

taylor13 commented 6 years ago

Could we discuss this at the meeting in Reading in June?

castelao commented 6 years ago

I believe that there was an agreement in Reading to create a DOI for the CF convention documentation. Is that correct? If so, shall we discuss the details on how to do it?

We have a few options on how to implement it. One of them is using Zenodo as suggested by @rsignell-usgs , which would also archive the document itself as mentioned by @davidhassell , and would allow a general DOI grouping all releases as suggested by @ethanrd . I use Zenodo in other projects and it is minimal work to operate in a GitHub environment.

I'm checking an alternative through UCSD library which offers similar resources and I just learned that they operate in a partnership with NCAR. I'll post here once I got some news.

rsignell-usgs commented 6 years ago

@castelao , thanks for picking this issue up again!

erget commented 6 years ago

I was really impressed by Zenodo and think it would be a great idea - lots of benefits, low workload.

castelao commented 6 years ago

UCSD library could provide that, but they suggested to use Zenodo since it can be integrated with GitHub, which I confirm that is nearly zero maintenance. My contact in the library also mentioned that they trust Zenodo due to the solid institutions that support it.

I canto do the repository setup to connect it with Zenodo automatically if there is a consensus to move this forward.

ethanrd commented 6 years ago

As I recall, the decision at the Reading meeting was to mint a DOI for CF in general rather than for any particular version of any particular document. Is there a way using Zenodo with GitHub to mint a DOI that isn't associated with a particular document/artifact/release?

Or, perhaps the overarching DOI should be tied to the CF web page repo rather than the CF conventions document repo. (Seems an appropriate repo since we want the DOI to dereference to https://cfconventions.org.)

PS David and I have started on a meeting summary document. We'll share it out for comment and such once it isn't quite so rough.

castelao commented 6 years ago

Sorry for the delay, I'm back.

Thanks for the correction @ethanrd. Yes, I also recall an agreement for a single DOI. Although I would recommend using a master DOI with one child DOI for each release, it is possible to use a single DOI for the CF concept. Thus it would not be associated to a specific version. In that case, I would recommend it to point to the general https://cfconventions.org website, not the repository.

My question is, how to move forward? If nobody says anything against this in 3 weeks, shall I start implementing such single DOI?

davidhassell commented 6 years ago

Creating a single DOI pointing to https://cfconventions.org would be great, I think, and what was decided at the Reading meeting. We didn't decide to not create further DOIs (e.g. for different conventions versions) simply because we couldn't decide in the limited time how best to proceed. These will come later ...

rsignell-usgs commented 6 years ago

Sounds like a thumbs up, @castelao !

castelao commented 6 years ago

Great! We need to put some information together to move this forward:

Are there funding agencies? Which ones?
Who are the creators? The list of authors in the main document?
There are other categories of contributors that are not creators/authors. Who are the contributors and which category? There are more options if the list below is not enough.
- Editor
- Data collector
- Data curator
- Project leader
- Project manager
- Project member
For everyone that go in this DOI, it would be nice to have their ORCIDs, so this DOI is connected directly to each one.

The other fields should be straightforward, but I would print it all here for approval before submitting it.

HeinkeH commented 6 years ago

My https://orcid.org/0000-0002-0131-1404 Best wishes Heinke -- Heinke Höck World Data Center for Climate (WDCC) Abteilung Datenmanagement

Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45 a • D-20146 Hamburg • Germany

Email:hoeck@dkrz.de URL: www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784

castelao commented 6 years ago

The required fields to mint a DOI for CF conventions. I used the documentation authors as creators, but it still misses most of the ORCIDs. These are only suggestions; please let me know if you would like to change any field.

=========

URL: http://cfconventions.org
Title: NetCDF Climate and Forecast (CF) Metadata Conventions
Publisher: ??
Publication Year: 2004 (Since this is the general DOI, should it go with the very first version or the latest?)
Resource Type General: Text
Description:
- Descriptive information: This document describes the CF conventions for climate and forecast metadata designed to promote the processing and sharing of files created with the netCDF Application Programmer Interface [NetCDF]. The conventions define metadata that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF conventions generalize and extend the COARDS conventions [COARDS]. The extensions include metadata that provides a precise definition of each variable via specification of a standard name, describes the vertical locations corresponding to dimensionless vertical coordinate values, and provides the spatial coordinates of non-rectilinear gridded data. Since climate and forecast data are often not simply representative of points in space/time, other extensions provide for the description of coordinate intervals, multidimensional cells and climatological time coordinates, and indicate how a data value is representative of an interval or cell. This standard also relaxes the COARDS constraints on dimension order and specifies methods for reducing the size of datasets.
- Type: Abstract
- Description Language: English
Funding Reference (May have several items):
- Funder Name: ??
- Funder Identifier: ??
- Identifier Type: ??
Creators
- Brian Eaton
- Affiliation: NCAR
- Jonathan Gregory
- Affiliation: University of Reading and UK Met Office Hadley Centre
- Bob Drach
- Affiliation: PCMDI, LLNL
- Karl Taylor
- Affiliation: PCMDI, LLNL
- Steve Hankin
- Affiliation: PMEL, NOAA
- Jon Blower
- Affiliation: University of Reading
- John Caron
- Affiliation: UCAR
- Rich Signell
- ORCID: 0000-0003-0682-9613
- Affiliation: USGS
- Phil Bentley
- Affiliation: UK Met Office Hadley Centre
- Greg Rappa
- Affiliation: MIT
- Heinke Höck
- ORCID: 0000-0002-0131-1404
- Affiliation: DKRZ
- Alison Pamment
- Affiliation: BADC
- Martin Juckes
- Affiliation: BADC
- Martin Raspaud
- Affiliation: SMHI
- Randy Horne
- Affiliation: Excalibur Laboratories, Inc., Melbourne Beach Florida USA
Contributor (May have several items. I suggest to clearly define the criteria to be considered a contributor.):
- Contributor Type: [ Editor | Project Manager | Project Member | Related Person | Researcher | Supervisor | Other ]
- Name:
- ORCID:
- Affiliation:

ngalbraith commented 6 years ago

Thanks, that's great. While I agree that we need to mention standard names, I don't think we can say 'precise definition of each variable via specification of a standard name' though, because CF allows variables without standard names.

Also, I'm afraid some of the details might be better omitted, like 'describes the vertical locations corresponding to dimensionless vertical coordinate values' - which may only confuse people. You covered that nicely with 'spatial and temporal properties of the data', I think.

Last, in a brief description of CF, could we consider omitting references to COARDS? I realize it's good to keep that in the CF docs, but ... is there any part of COARDS that's not described in those docs? I'm almost sure the answer is no. I went to check, however the link to COARDS from unidata.ucar.edu/software/netcdf/conventions.html is broken. Hmm, that tells us something, too.

graybeal commented 6 years ago

I remember doing a similar exercise several years back. The COARDS-unique contribution was a very thin thread indeed, in my opinion. COARDS is very old now, and relatively to CF, was pretty primitive at its completion. I would be surprised if it is in use at all. At this point, I think "COARDS-inspired" might be closer to the truth, and more useful framing.

John

On Oct 23, 2018, at 9:36 AM, Nan Galbraith notifications@github.com<mailto:notifications@github.com> wrote:

Thanks, that's great. While I agree that we need to mention standard names, I don't think we can say 'precise definition of each variable via specification of a standard name' though, because CF allows variables without standard names.

Also, I'm afraid some of the details might be better omitted, like 'describes the vertical locations corresponding to dimensionless vertical coordinate values' - which may only confuse people. You covered that nicely with 'spatial and temporal properties of the data', I think.

Last, in a brief description of CF, could we consider omitting references to COARDS? I realize it's good to keep that in the CF docs, but ... is there any part of COARDS that's not described in those docs? I'm almost sure the answer is no. I went to check, however the link to COARDS from unidata.ucar.edu/software/netcdf/conventions.htmlhttp://unidata.ucar.edu/software/netcdf/conventions.html is broken. Hmm, that tells us something, too.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/cf-convention/cf-conventions/issues/127#issuecomment-432321913, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABNU0Oh3vV6G1bWXOnB7LPrLEj4awc_pks5un0WogaJpZM4RjadX.

======================== John Graybeal Technical Program Manager Center for Expanded Data Annotation and Retrieval /+/ NCBO BioPortal Stanford Center for Biomedical Informatics Research 650-736-1632

ngalbraith commented 6 years ago

Without wanting to belabor this point, since it's clearly been decided, I'd like to point out that the DOI can in no way replace the description of CF in the Conventions attribute: 'files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.7"'.

The Conventions attribute is human-readable, very important for those of us who use data at sea or otherwise off-line. This leaves the issue of multiple items containing identical (hopefully) information. To me, it's one more reason to use a single DOI for CF, not for each version.

castelao commented 6 years ago

My understanding is that this DOI would be used as any other reference, which should be included in the list of references at the end. With a single DOI or one for each version, the version identification (ex. CF-1.7) would go explicitly in the bibliographic references section. In my lab, we explicitly write a line suggesting how to cite our DOIs.

About adopting multiple versions or not, it might be useful to distinguish between major versions, like someone that wants to refer that followed CF-1.X or CF-2. I don't have a strong opinion about adopting a DOI for each version, but at least one DOI is crucial to allow any metric of the scientific impact of the CF conventions.

dblodgett-usgs commented 5 years ago

Several months on. Can you update where we are on this @castelao -- or maybe @rsignell-usgs can update the original description to the latest status of the discussion? Or maybe the best would be to close this and open a fresh issue?

martinjuckes commented 5 years ago

I mistakenly raised a new issue on this topic (#206), which I have now closed ... but there were some interesting comments there before I closed it.

I'd like to pick up a couple of points:

(1) @graybeal asked why DOI rather than a stable URL: this relates in part to my specific proposal in #206, which is slightly different from the discussion here in that I was suggesting depositing a pdf document in a secure long term repository which provides a DOI and landing page. This off-loads a lot of the work: it becomes a formal publication process in which we prepare the document and submit it for publication, and then let the repository deal with the long term persistence. People like Zenodo are funded to provide this service for people like us ... we might as well use that service. The DOI also provides a mature and stable citation tracking service.

(2) It is interesting that NetCDF is using a DOI as a pointer to their web page, but I don't think this is a good use of the DOI functionality. The cf-conventions.org site includes more than the standard. I would like to see a DOI used for a citable, fixed document. If someone prepares data for publication using a version of the CF Convention, they should be able to reference that specific version. If we do reference cf-conventions.org, then the description should refer to the governance process and other aspects of the site, but my preference would be to refer to a fixed CF convention document to support precise and traceable citation of that document.

(3) Zenodo, which is linked to github, provides a service which can issue DOIs for version controlled documents, with a DOI for the collection and individual DOIs for each new published version.

castelao commented 5 years ago

1) Another advantage of a DOI is that, perfect solution or not, this is how we currently track academic impact factor, and that ranking impacts the next proposal that anyone is going to write. Thus it can help to justify for each contributor the time invested on CF.

3) I use Zenodo for all my software development and it works great. For CF conventions I would imagine not so frequent updates, so the automatic response of GitHUB linked with Zenodo might not be so important. Zenodo can be a good solution, and I would like to point UC Libraries with EZID as another alternative. I register DOIs myself with EZID which gives more freedom on the metadata to be recorded, and the UC Library also provides the long term archiving with more competence that I could describe.

castelao commented 5 years ago

@bnlawrence also raised the point of one DOI per release (see #206 ).

castelao commented 3 years ago

Hi everyone. Is there still interest in moving this forward? Would make sense to have a breakout room for the CF meeting next month?

erget commented 3 years ago

I would be interested in this, although I'm not an expert on this topic.

HeinkeH commented 3 years ago

I am very interested in it. DOIs for cf documentation and cf-standard name lists. That would be very helpful especially for the FAIR principles.

ngalbraith commented 3 years ago

I agree that both the standard name table and the conventions document would probably need separate DOIs.

We've gotten a DOI for the most recent version of the OceanSITES Data Format Manual. I believe that implies that we'll have to keep this version available in perpetuity, since new versions will have new DOIs. I'm not sure who decided to get a DOI, or what registration agency they used; it just appeared on our document's listing at Ocean Best Practices.

Is there a plan to keep all versions of the CF conventions document available, or am I mistaken in thinking this as an implication of assigning a DOI? Or, will Zendo take care of that in some way that other registration agencies don't?

ethanrd commented 3 years ago

Yes, I would like to see this move forward.

It looks like Zenodo automatically supports having both an overarching CF DOI and DOIs for each released version (see the Zenodo FAQ “How does DOI versioning work?”). It also supports creating (by hand) DOIs for already released versions (see Zenodo FAQ “Is it possible to archive a GitHub repository, before it was enabled on Zenodo?”).

Zenodo FAQ “How can I edit the metadata of a published record?” says that “almost all” of the DOI metadata is editable. Does anyone know if the URL to which the DOI resolves is editable? This seems like an important feature of DOIs. So I would assume it is editable. However, I haven’t found this mentioned explicitly and Zenodo FAQ “Where does the Concept DOI resolve to?” says

Currently the Concept DOI resolves to the landing page of the latest version of your record. This is not fully correct, and in the future we will change this to create a landing page specifically representing the concept behind the record and all of its versions.

I suspect this is just the default location and the URL can be modified. But I will try to test and confirm.

[Sorry for the indirect FAQ links. I did not find direct links to the individual FAQ questions.]

castelao commented 3 years ago

It's great to hear that there is interest!

My understanding from the comments is that we agree that CF conventions should have a DOI and the standard names should also have a DOI. It sounds like a good idea since the standard names table will probably be updated more frequently than the CF-convention itself. Also, it is expected that both will keep backward cross-compatible.

@ngalbraith, most of the agencies that register DOIs already take care of copying and making that available. So we most probably won't need to deal with that. Independent of that, GitHub also provides that frozen version once we create tags.

Yes @ethanrd , Zenodo has an overarching DOI which can be used when someone wants to mention CF in a generic way. That DOI always points to the latest version. Another case is when a specific version is required, thus the specific version DOI is used. If the idea was to use an overarching DOI for the conventions and the standard names, I don't think that will be straightforward with Zenodo, but I don't think that would be a problem.

I suggest two possible paths to record the DOI, which will determine a little the dynamic and maintenance. In summary, Zenodo is the least effort since we are already using GitHub. It would be pretty much automatic once we set up everything. The other alternative is through the University of California Library, which would require some manual steps but we would have more freedom to do as we want (potentially taking more advantage of the flexibility of the DOI standard). I use the UC Library to handle datasets at Scripps Institution of Oceanography, and use Zenodo for my open source projects. I don't have a clear opinion on what would be best for CF. I guess that if the standard Zenodo is fine for you all, it would be the easiest to maintain.

There are some tricks to control Zenodo. Check this https://github.com/castelao/inception . By using this .zenodo.json we can, for instance, cross-link the conventions with the standard name, plus any other relevant publication. Zenodo takes some information from the repository automatically, but it is limited, which can be resolved by defining a .zenodo.json . Note that:

The version DOI: https://zenodo.org/record/3986146
A previous version DOI: https://zenodo.org/record/3986145
The overarching DOI (points to the latest version. Useful to aggregate citations): https://doi.org/10.5281/zenodo.3981501

Whatever path is decided, we will need some information in anyways, such as the list at https://github.com/cf-convention/cf-conventions/issues/127#issuecomment-432292340 . Is the list of authors correct and complete? It's missing some ORCIDs. What would be the list of contributors? Not only people, but also institutions and agencies. A note, all this metadata can be edited, expanded, and fixed later.

Let me know your questions.

ethanrd commented 3 years ago

Hi all - I was wrong about the ability to change the URL resolved to by a Zenodo DOI. It always points to the corresponding Zenodo archive page. Which makes sense given Zenodo is a repository and not just a DOI minting service.

Hi @castelao - I agree, using the Zenodo/GitHub integration would be much easier in terms of the effort involved in minting and maintaining CF DOIs. On the other hand, using a separate DOI registry service would allow the DOIs to resolve directly to CF pages rather than to the Zenodo archive of the GitHub repository. So the main CF DOI could resolve to the main CF web site (https://cfconventions.org/) and the version DOIs could resolve to the individual CF version pages (e.g., https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html).

The Zenodo DOI metadata can contain related identifiers/URLs which could reference the appropriate CF website page(s). So, with either method, dereferencing DOIs from a citation will lead to the CF content, it will just be an extra step or two with a Zenodo generated DOI.

Perhaps a conversation for the CF Workshop?

davidhassell commented 3 years ago

Hello,

I'd just like to advertise that discussion of this issue has now been added as breakout session in next week's online CF meeting (https://cfconventions.org/Meetings/2021-Workshop.html). It will be discussed at 15:00 UTC on Thursday 23rd September.

Thanks, David

castelao commented 3 years ago

I spoke with a couple of experts on DOIs, and here is my suggestion.

First, some clarifications:

The DOI handle that we see, something like 10.21238/S8SPRAY1618, is like a primary key in a database. Associated with that DOI, there are several fields stored in a public database, such as creators, license, contributors, funding agencies, and related DOIs. This last one is how a DOI is linked to other items. Another field is the URL for the landing page, which is not necessarily the dataset or the document itself, but a human-readable page explaining what it is and where to get it;
DOIs are not URLs and should not replace URLs. We should keep using URLs in the websites and even documentation, while DOIs would be used on the bibliography list;
The DOI handle doesn't change, but the fields of that DOI can be updated;
Only a few fields are required to register a DOI, while most of the fields are optional;

My suggestion is to create the following structure of DOIs:

(A) Top DOI to aggregate the whole CF
- (B) DOI for cf-conventions (isPartOf A)
  - (E) DOI for conventions version 1.9 (isVersionOf B)
  - (F) DOI for conventions version 1.8 (isVersionOf B)
  - DOI for conventions version 1.7 (isVersionOf B) ...
- (C) DOI for standard names (isPartOf A)
  - (D) DOI for standard names version 75 (isVersionOf C)
  - DOI for standard names version 74 (isVersionOf C)
  - DOI for standard names version 73 (isVersionOf C) ...

One top-level DOI for the whole CF-Conventions, as an evolving concept (A).

The conventions manual and the standard names evolve on different time scales, and one specific convention version could be used with several versions of standard names. Thus let's keep those separated — one DOI for CF-conventions (B) and one for standard names(C).

Each version released receives its own DOI. For instance, standard names version 75 has one DOI (D), as well as CF-Conventions 1.9 has its own DOI.

This structure allows for granularity on how to use it. Someone interested in reproducibility would be interested in defining one specific version and would use DOIs such as (D), (E), or (F). If a specific version is not required, one would use (C) or (B). Finally, to talk about the whole system, one would use (A). (A) would also be the way to measure the scientific impact (like cited n times) since it aggregates all components.

All these DOIs would be linked. For instance, E isVersion of B, D isVersion of C, and C is PartOf A. These links are registered on the creation of the DOI. Nobody has to worry about this except the person recording the DOI. These links are meant for machines, not for us.

How to manage all this? DOIs for specific versions could be registered using Zenodo. That can be set up to be done automatically every time we do a new release on GitHub. Zenodo takes a snapshot of the repository at the moment of the release and archives it. The URL (landing page) associated with that DOI is the archiving place at Zenodo.

The general DOIs for CF-Conventions and standard names (B & C) would be the overarching DOIs created by Zenodo, which points to the latest version of each line. For instance, if I'm not concerned with the specific version but just want to refer to the standard names, I would point to DOI (C), which is automatically updated to point to the latest version of standard names.

Finally, the top-level DOI (A) would be recorded manually, without using Zenodo. Some alternative providers are UC Library or UCAR. Note that this information is not stored at UC Library or UCAR, but they can provide us access to the system. It requires more work since it is manual, but we don't expect regular updates, and it allows us the freedom to define each field as we want. Using Zenodo we don't have access to all the fields related to a DOI. For instance, we can't choose a landing page. For the top-level DOI, a good landing page (i.e., the URL associated with that DOI) could be the cf-conventions.org website.

In summary, using Zenodo we can create a configuration file to be saved in the repository so every release, everything goes automatically, but we need to conform to the way Zenodo operates. Registering manually is more work but gives more freedom.

If this sounds too complex and there is no interest in granularity to cite specific versions, we can ignore the levels below B & C. If we want a minimalist version and really don't care about versions and reproducibility, we can use (A) only.

Please, let me know your questions.

sethmcg commented 3 years ago

I agree that it makes sense to have a top-level DOI for CF, and then lower-level ones for cf-conventions and the standard names. I think level C is too granular and complex, and that we would be asking the community to commit to maintaining a large number of DOIs in perpetuity for insignificant benefit.

I can see in the abstract that a version-specific DOI might be useful for reproducibility, but do we have a specific use case for which that would be notably better than just using the standard cf-conventions DOI in conjunction with a version string? If not, I think it would be better just to stick with A and B-level DOIs.

zklaus commented 3 years ago

I really like the approach laid out by @castelao. A few points are probably worth mentioning/stressing:

DOIs for Versions

This is baked into Zenodo. If you have a look at the Zenodo FAQ, Section "DOI versioning" you will find more information, but the most salient quote is perhaps:

When you publish an upload on Zenodo for the first time, we register two DOIs:

DOI representing the specific version of your record.

DOI representing all of the versions of your record.

Afterwards, we register a DOI for every new version of your upload.

So using Zenodo, it is almost unavoidable to give a new DOI to every version. Of course we could discourage the use of individual version DOIs or use a different service than Zenodo, but the similarity of the schemes is no accident, but rather follows from the idea of referring to an immutable thing with one DOI.

Zenodo contents and Github integration

It is true that the standard Github integration places an archived version of the Github repository into Zenodo. However, this is by no means the only way to do it. We could, for example, archive only the PDF version of the conventions in Zenodo, together with a link to the relevant release/tag on Github, or only the HTML version or both. This is also possible in an automated way with no manual overhead.

Zenodo landing page

As far as I understand it is also true that Zenodo always uses the Zenodo landing page, so if we would like to have a different landing page, for example the CF conventions website, a different service must be used.

cf-convention / cf-conventions