christophergandrud / repmis

Miscellaneous tools for reproducible research with R.
24 stars 9 forks source link

LoadandCite does not respect package's recommended citation mechanism #19

Open cboettig opened 7 years ago

cboettig commented 7 years ago

R packages have a mechanism for allowing their developers to indicate how they wish to be cited. This can be particularly important in making sure that a relevant software paper is cited, rather / or in addition to citing the package itself, since for many academic developers only citations to the software paper have much value.

It looks to me like LoadandCite ignores the existing citation mechanism, e.g. LoadandCite("knitr", file = "refs.bib") does match the recommended citation of citation("knitr").

christophergandrud commented 7 years ago

Do you mean it only includes in refs.bib:

Yihui Xie (2016). knitr: A General-Purpose Package for Dynamic Report
Generation in R. R package version 1.15.1.

and not also:

Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and
Hall/CRC. ISBN 978-1498716963

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R.
In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing
Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
cboettig commented 7 years ago

Exactly. In some cases the authors indicate that they really want the paper cited, and don't care so much about the package (e.g. see citation("pomp"), among others). I think it's possible that citation() can even omit the package itself, though that would seem silly perhaps)

christophergandrud commented 7 years ago

I guess there is a user interface issue: How to inform users that there are multiple citations for the package. I would imagine that LoadandCite is typically run in an include=FALSE code chunk so the user might not get a message.

(I had a much longer draft post exploring the fundamentals of "software citations", but this is probably not the venue for that. My preference is that LoadandCite help enable replication. As such, citing the software used itself with version number is paramount. I'm kind of ambivalent about other related works.)

cboettig commented 7 years ago

Yeah, it's a tough issue in general, I'm really not sure what the best solution is. Surely it seems dubious though to override R's default behavior and the package authors intentions from citation() without even a warning?

I'm with you on the importance of replication, I'm just skeptical that citations are really much help in that regard. If I don't have access to someone's code then just knowing if they used v1.1.0 or v 1.1.1 of some package isn't going to help much, right?

I'm also sympathetic to the case that if people can show software itself is getting cited, the community as a whole can make a more convincing case that software should be considered a valid product of research, whether or not it's accompanied by a software paper. But I dunno. See conversation over here: https://github.com/ropensci/unconf17/issues/24

christophergandrud commented 7 years ago
  1. Maybe the LoadandCite function should make it clear that it is creating the bibtex info for citing the package used rather than an ancillary paper/book?

  2. Definitely, what we really need is the source code + the package version numbers.

  3. Good to improve incentives, but my (probably overly) cynical take: it would be weird if a book author asked you to cite a different book than the one you based your research on because of some reason having to do with how their employer counted one type of book compared to another.

cboettig commented 7 years ago

yeah, I've never heard of a book author doing something like that either. But citing papers as a place-holder for citing software is probably not seen as that weird. For instance, the original paper for the BLAST algorithm is one of the most cited biology papers of all time, despite the fact the the algorithm everyone is using has evolved significantly and now operated and maintained by researchers at NCBI and not the authors of the original paper. Perhaps this is about citing the original provenance of an idea, perhaps it's just citing the same bucket everyone else does. Maybe a bit of both.

christophergandrud commented 7 years ago

Yeah, really it seems the citation should go something like:

We use method X implemented in PACKAGE_CITE. The method was was originally proposed in PAPER_CITE.

Presumably people using a method have read the paper explaining the method and would cite PAPER_CITE. LoadandCite is really for making sure the PACKAGE_CITE version matches the version actually used for the analysis.