airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Should we be making AIRR releases citable? #467

Closed bcorrie closed 1 week ago

bcorrie commented 3 years ago

There is a mechanism for this linking Github and Zenodo.

https://guides.github.com/activities/citable-code/

bcorrie commented 6 months ago

Do we want to consider this? It looks like the AIRR Community has a Zenodo record and @bussec manages this.

https://zenodo.org/communities/airr-community/records?q=&l=list&p=1&s=10&sort=newest

The process one follows is enables github hooks to Zenodo, then when you tag a release it is automatically published to Zenodo.

We do this for iReceptor releases (e.g. https://zenodo.org/records/7430516)

javh commented 6 months ago

Seems like an extra layer to me, as we'll have a v2 paper to cite.

bcorrie commented 6 months ago

Not sure I agree - this isn't a question of whether we should make the v2.0 release citable, it is whether we should make all versions citable. We will (we hope) have a paper that matches the v2.0 release, but there is no real mechanism to properly cite any of the intermediate releases. From a FAIR software (or even FAIR standards) and reproducible research perspective, how can I cite that I used v1.4.1 of the python library. Or that my data is in a format that adheres to v1.4.1 AIRR specification (not 1.0, not 1.2, not 2.0).

Yes, you can point to PyPi or a github release tag, but neither are the equivalent of having a DOI/PID for that release.

javh commented 6 months ago

What would be the benefit of citing a specific version as opposed to just stating the version when you cite the paper? Would this be instead of citing the paper?

bussec commented 6 months ago

For the record: We have been doing this for the last 6 years ;-) see here DOI:10.5281/zenodo.1185414. Somehow the automatic process broke between 1.3.1 and 1.4, I will fix this with the next days.

javh commented 6 months ago

lol. Well, that settles that then. :)

bcorrie commented 6 months ago

For the record: We have been doing this for the last 6 years ;-) see here DOI:10.5281/zenodo.1185414. Somehow the automatic process broke between 1.3.1 and 1.4, I will fix this with the next days.

Excellent...

bcorrie commented 6 months ago

What would be the benefit of citing a specific version as opposed to just stating the version when you cite the paper? Would this be instead of citing the paper?

The point is that the DOI is a PID for the release itself. It is both a mechanism for being more scientifically reproducible as well as a mechanism for providing credit for software (in the FAIR4RS space) and standards (in the AIRR case) as an output of the research process. In our case, the paper is a description of the AIRR Standards, but it is only a small part of capturing how that standard (or the AIRR software) is used in the research process/pipeline for a specific paper.

In the case of iReceptor, the paper was written in 2018 by the people on the project at that time. Since 2018 there have been many contributors to the software but they get no credit for that work when the paper is cited. It would be the same for the entire Immcantation suite of tools as well I suspect.

Think of reading the "Methods" for a paper but never having to do a google search to actually find the exact release of the software that was used (since there is a DOI that you can click on) - let alone the fact that you may never find it. Multiply that by every piece of software used in every paper.

See:

Knowles et al., “We need to talk about the lack of investment in digital research infrastructure”, Nature Computational Science 1, no. 3 (2021): 169-171, https://www.nature.com/articles/s43588-021-00048-5

and for a more lighthearted view: https://xkcd.com/2347/

There is some very interesting work going on to try and quantify the value of research software, and the reason it is so hard is that software is only "mentioned" in papers and not cited.

Istrate et al., “A large dataset of software mentions in the biomedical literature”, https://arxiv.org/abs/2209.00693

bussec commented 5 months ago

I now had a closer look at this, but I could not figure out why the v1.4.1 and v1.5.0 releases did not trigger the archiving routine, potentially the token expired or was accidentally deleted. I then created a new webhook for the AIRR Standards repo which has been successfully tested and I used it to manually push the v1.4.1 and v1.5.0 releases to Zenodo (using https://github.com/zenodo/zenodo/issues/1463 as a cheat sheet). So I think this issue is solved now, we should however confirm that the next release does trigger the archiving again.

bcorrie commented 1 month ago

@bussec can we close this? Create a new issue if next release doesn't work...

javh commented 1 week ago

IIRC, this was resolved.