arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.56k stars 387 forks source link

Track citations in InferenceData #1671

Open OriolAbril opened 3 years ago

OriolAbril commented 3 years ago

AFAIK, there is no clear extended/good system to track citations in python (I hope I'm wrong though), I propose using the attributes in InferenceData to track this. If possible, converters could add the relevant citations for their library (in addition to name and version) as well as the citation(s) for the sampling algorithms used. Base ArviZ, xarray, numpy, netcdf4? citation could also be added to that. Then whenever the InferenceData is used to call function x, the relevant citations for algorithms and libraries would be added to the attribute. Calling az.summary for example would add citation for pandas as well as citations for ess, rhat, mcse, hdi algorithms used. Calling loo would add citations to scipy as well as psis, loo papers and so on.

Ideas on implementation welcome, first things that come to mind are a giant string with all bibtex citations appended, having a tuple/set attribute with unique ids that can then be passed to az.get_citations to get the actual bibtex as a giant string or written to file

aloctavodia commented 3 years ago

We could populate a list with unique two character identifiers, to keep it short, or using name-year (or name-year-letter to disambiguate), to make it more human-friendly. We then ensure the list is not redundant and then get the bibtex from a dictionary. That should be fine when the worflow is creating a bibtex file for each project . But it will not help if the worflow is getting access to a bibtex database using a reference manager. In that case I wonder how difficult will be to provide integration with tools like zotero?