Consider removing CITATION.cff from minimal

abelsiqueira commented 3 months ago

Description

CITATION.cff could be in recommended instead.

Validation and testing

No response

Motivation

CITATION.cff is often just relevant in research, so it could be part of recommended only. The counterpoints for maintaining it is that it is not invasive, and we want to recommend some best practices, and that might include citation. On the other hand, citation is mostly relevant in research, so we go in circles.

Target audience

No response

Can you help?

No response

ReubenJ commented 1 week ago

For a research use-case where this is indeed a relevant question: for Herb.jl we use an umbrella package setup where it doesn't make sense for the individual repositories (like HerbSpecification.jl) to have a CITATION.cff file. I'm trying out BestieTemplate on our projects right now, and I'll need to delete the citation file every time I (re-)apply the template. In this case, it'd be handy for its existence to be configurable.

abelsiqueira commented 5 days ago

That's an interesting reason. Do you generate DOIs for the subpackages, though? It feels like one should, in this case.

ReubenJ commented 3 days ago

It's a good question and prompted some discussion among the group. Do you happen to know of any guidelines on what should have a DOI? It seems like both approaches could make sense. While each subrepo has its own version, citing all of them does seem a little verbose.

abelsiqueira commented 3 days ago

I also don't have a final answer, so this reply is much longer than expected. Sorry in advance.

I feel like one of three resources listed in https://www.esciencecenter.nl/knowledge-base/ (fairsoftware.nl, Turing Way, SMP) will have some information in general, at least. @egpbos, @c-martinez, maybe either of you can share something more concrete?

My reflection at the moment is:

Add a DOI to anything that moves. This ensures that every release is stored for posterity.
- I would also generate DOIs for experiments - collection of scripts and data/ways to get that data in a git repo with a Julia environment. In that case, the Manifest.toml should be included as well.
Add a CITATION.cff to these repos. Maybe I don't want or need the citation, but at least the metadata there can be useful for various things.
- In particular, Zenodo generates a DOI automatically after a GitHub release (if you set it up). If there is a CITATION.cff file, Zenodo will use that to generate the metadata of the DOI.
Have a main citable object. Either a paper or the main software.
- Going with papers is still easier to get cited - hopefully that'll change eventually - so I would submit the software to JOSS or JORS if I didn't have an associated domain paper.
Use the preferred-citation field of the CFF to list that main object in every CITATION.cff.
- I'm not up-to-date on tooling, but I hope that things like Mendeley and Papers eventually pick it up.
Update the message field, and add some note in the README/docs to ask people to CITE using the preffered-citation data in the CITATION.cff file.
Also ask people to list all versions of the package and their dependencies
- As you said, it's verbose to cite everything, so as a compromise for reproducibility, having the versions listed would be important
- Also notice that if the citer uses an experiment repo and adds a Manifest.toml file there and puts that in a DOI archive, then it's implicitly covered

Also, it might be useful to list the dependent DOIs (maybe as reference)?

It is a complicated topic, so I hope my eScience colleagues can share some opinions.

ReubenJ commented 3 days ago

Thanks for all of the input! I like the solution of having the main citable source + a DOI for the experiments which then includes exact versions of everything via the Manifest.toml. That seems like a clean solution. I'm curious to hear what the others have to say as well.

c-martinez commented 2 days ago

Hi @ReubenJ,

Short answer: I agree with what @abelsiqueira already said.

A bit longer: from the FAIR4RS Principles point of view, I think generating DOIs for subpackages and experiments would be in line with I2. Software includes qualified references to other objects.

What I2 tells us is that ideally, your software should have references to libraries and dependencies it uses, and that ideally those libraries and dependencies should have persistent identifiers (DOIs).

(here is where I start trying to think how this would work in Julia, and must admit I have no idea if this holds in practice)

I imagine a typical scenario would be something like this:

A researcher uses Herb to solve some problem. They would create a Problem.jl, which would implement the solution to their problem. Problem.jl will use Herb.jl (which would in turn require HerbSpecification.jl). For their solution to be reproducible, they should provide Problem.jl and specify which version of Herb.jl and HerbSpecification.jl they used.

From this point of view, I think Problem.jl should have a DOI, Herb.jl should have a DOI and HerbSpecification.jl should have a DOI. And I guess, all of these would be in the Manifest.toml file you mention?

Does this help?

egpbos commented 1 day ago

Completely agree with what @abelsiqueira and @c-martinez said. For me, it would also be a matter of keeping things simple. If I build a package (which implies it is some stand-alone reusable thing), I just make it easily citable, just in case anybody indeed wants to reuse it and is also nice enough to cite it. I remember during my PhD 10+ years ago when I first tried to cite software that it was quite an effort to cite even some of the biggest packages out there (stuff from the scientific Python stack). My past self would be thankful for anyone who made that easier :)

ReubenJ commented 1 day ago

Thank you all for your input—makes sense!

abelsiqueira / BestieTemplate.jl