force11 / force11-sciwg

FORCE11 Software Citation Implementation Working Group
https://www.force11.org/group/software-citation-implementation-working-group
BSD 3-Clause "New" or "Revised" License
56 stars 18 forks source link

Generating DOIs for "unversioned" software "packages"/"works" #73

Open danielskatz opened 5 years ago

danielskatz commented 5 years ago

Based on discussion in Section 3.5 of A&P google doc

moranegg commented 5 years ago

In my opinion DOI should be reserved for the software concept/work/product/unversioned and not the software version, which is not the case at the moment.

Having 10 versions of a software product with the same metadata but different source code will result in 10 different DOIs and with Zenodo a "versionless" DOI. Each researcher who cites the software could choose a different DOI to cite and the citation count will be impacted (unless indexers are aware of this ambiguity).

This subject should be discussed in the joint RDA-FORCE11 Software Identification WG.

alee commented 5 years ago

We had been considering going down this rabbit hole of versioned DOIs (copying Zenodo's model), issuing a DOI for each specific release/version of a software project in addition to a rollup DOI for the entire project but after more thought (thanks for sharing your opinion on this @moranegg :+1:) I'm now leaning towards pointing the DOI at the entire software project, and enthusiastically recommending that people who cite the software include the specific version number in the citation text itself (whether it be in semver, calver, or whatever format).

It seems much simpler than having to deal with nested DOIs and requires less work all around..

danielskatz commented 5 years ago

I'm now leaning towards pointing the DOI at the entire software project, and enthusiastically recommending that people who cite the software include the specific version number in the citation text itself (whether it be in semver, calver, or whatever format).

I strongly disagree.

The software citation principles say "Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms." And I think the identifier should clearly identify the version that was used.

If the version information is only in the citation text, this only will work for papers, not other products, and only for papers where the citation text is actually parsed, rather than the identifier.

I think that

Each researcher who cites the software could choose a different DOI to cite and the citation count will be impacted (unless indexers are aware of this ambiguity).

is exactly the correct behavior that we want, since different versions will have different authors. I agree that we do need to work with indexers to create ways to count citations for groups of versions.

augustfly commented 5 years ago

Seconded, @danielskatz . There are many open questions for indexers to handle when rolling up or describing versions, but that is true in any future where we do more than track the "cites" relationships on the network graph. Indexers figured out how to count preprints + postprint citations pretty quickly, I don't see why they can't (or maybe it is suggested that they shouldn't) figure out how to roll up version'd citations especially if the metadata relationships exist. And there will be versions of postprints to rollup very very soon. And supplemental data citation rollups, etc.

On the subject of the original issue, is this the same problem as identifying a "software paper" as (one of) the representative object(s) for a piece of software? In other words, lots of people write papers about software. These are often concept papers that abstractly represent the software created (let's just assume the software described is open source one way or another).

Regardless if they mint an archival DOI (or a website indexes a concurrent commit hash around the same time as the paper), someone has to be able to assert a relationship that the paper is about a piece of software. Not doing that makes it very hard for anyone downstream to identify the paper as being about software (or about software and 10 other things) or to roll up citations to that paper as a proxy for piece of software.