ImperialCollegeLondon / R2T2

Research References Tracking Tool
MIT License
14 stars 155 forks source link

Parse code and documentation for arbitrary references #15

Open jezcope opened 4 years ago

jezcope commented 4 years ago

Fun, challenging and very valuable, but probably more effort than it's worth.

de-code commented 4 years ago

Sounds good

dalonsoa commented 4 years ago

Things that could be search for:

de-code commented 4 years ago

Hi @dalonsoa how would this fit into the whole process, i.e. how would it be used via the command line? (maybe it would be good to add example files to the project).

Say we have the file examples/docstring_references.py:

def some_function():
    """
    Using algorithm introduced by 10.1234/zenodo.1234567
    """
python -m r2t2 --static --format=markdown examples/docstring_references.py

Should the output be something like:

Referenced in: some_function  
Source: [examples/docstring_references.py](examples/docstring_references.py:1)  
Line: 1

    [1] 10.1234/zenodo.1234567

Would you expect to be activated via --static or a dedicated switch?

de-code commented 4 years ago

The Spinx-bibtex documentation lists the \cite syntax as LaTex (but has no quotes). But :cite: for Sphinx.

e.g.: Sphinx

See :cite:`1987:nelson` for an introduction to non-standard analysis.

LaTex:

See \cite{1987:nelson} for an introduction to non-standard analysis.

Not sure whether an introduction to non-standard analysis should be considered the short purpose in this case (if it's being written that way).

de-code commented 4 years ago

The latex syntax causes a linting error on files using it: W605 invalid escape sequence '\c'

de-code commented 4 years ago

An example of someone using embedded bibtex:

https://github.com/kermitt2/delft/blob/18cb340fbea896ff709b1934c8087146fdd696ca/delft/sequenceLabelling/models.py#L435-L442

    For reference:
    --
    @article{devlin2018bert,
      title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
      author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
      journal={arXiv preprint arXiv:1810.04805},
      year={2018}
    }
de-code commented 4 years ago

Perhaps this issue should be closed in favour of more specific issues to further improve it? (e.g. extract short purpose)

de-code commented 4 years ago

Searching for cite on GitHub Python code seem to just bring back the same test citation using HTML:

<cite data-cite="granger">Granger</cite>
de-code commented 4 years ago

Some Markdown examples:

https://github.com/simoninireland/introduction-to-epidemics/blob/e12db123b2b1cf75bbf55ae6a4eb0facfcc22a69/src/reading.md

{cite}`Spi18`

https://github.com/rywyr/TabideaAPI/blob/7684f98635372ca000cdf800ddd3be63c9a0f3b3/vendor/bundler/ruby/2.5.0/gems/maruku-0.7.3/spec/block_docs/cites.md

\cite{chacaltana:2010ks, MR3046557, fubar,Chacaltana:2014ica}

or https://github.com/kenn44/monotonic-schemes-m2imsp/blob/1d9644912a9d71f1576ea4657b096b678908d0ea/README.md

\cite{Tannor}

or https://github.com/waltonjones/or_methods_authorea/blob/d0796628f70423f7360e2c365d12059aff71b11d/introduction.md

\cite{Vosshall_1999}\cite{Clyne_1999}\cite{Gao_1999}

https://github.com/coolx28/ThreatHunter-Playbook/blob/0a860507a7c5638419a6f77b10bd3d49adcf8867/adversary_attribution/APT12.md

[[CiteRef::Meyers Numbered Panda]]

or https://github.com/coolx28/ThreatHunter-Playbook/blob/0a860507a7c5638419a6f77b10bd3d49adcf8867/adversary_attribution/Stealth%20Falcon.md

[[CiteRef::Citizen Lab Stealth Falcon May 2016]]

https://github.com/dustinvtran/blog/blob/f5081b4f890c4ff3c6d48539f371496bbc75a890/_posts/2017-08-07-my-qualifying-exam-oral.md

{% cite mansinghka2014venture --style apa-text --file 2017-08-07 %}

or https://github.com/mrucker/markrucker.net/blob/69691e490b405c9332496c817ad27d3982b6b051/_posts/2020-05-11-primer-recommender-systems.md

{% cite goldberg1992using %}

or https://github.com/eddyerburgh/notes/blob/098086a2e5f64ac26fca6895f7df389b91ec9c04/docs/computer-networking/internet/http2.md

{% cite hpbn -l 207 %}

https://github.com/cnsuhao/engine1/blob/4b928612290150c2a3e0455e38e52d13d90a7340/docs/Methodology/CardiovascularMethodology.md

@cite taylor1999predictive @cite wan2002one @cite formaggia2003one

https://github.com/greentfrapp/project-asimov/blob/317bae1ba4ec46140025e793132a79426e2401b1/guide_fairness_fatpet.md

<dt-cite cite="dwork2012fairness"></dt-cite>

or https://github.com/greentfrapp/project-asimov/blob/76cd7a9e05b3f37e9302bed62a0f4c4c95900202/guide_bias_harms.md

<dt-cite cite="susskind2018future"></dt-cite>

https://github.com/andsor/notebooks/blob/7c4c6695cd48655b4cf6f158cbafd27649c78df8/src/nelder-mead.md

<cite data-cite="Nelder1965Simplex">([Nelder & Mead, 1965])</cite>

https://github.com/trilinos/trilinos.github.io/blob/d82edb4b0e38163a8888d00f101c09d57276bb8a/pages/packages/mpi/ml/ml_citation.md

<cite>arXiv.org, arXiv:0907.4863v1 [physics.comp-ph]</cite>, 2009\. ([link](http://arxiv.org/abs/0907.4863v1))

or https://github.com/chinapedia/wikipedia.ja/blob/95358c8263933cca36c7efb8a956fc0e29cae487/Page/%E8%9C%82%E5%B7%A3%E7%82%8E.md

<cite class="citation journal">Vary, JC; O'Connor, KM (May 2014). </cite>

Rather plain text with link https://github.com/OpenNFT/opennft.github.io/blob/c4675b9249bded263d7fd3f576f461a47974af6d/About.md

For open-source OpenNFT code and applied real-time data processing and software features, cite [Koush et al., 2017, Neuroimage 157:489-503](http://www.sciencedirect.com/science/article/pii/S1053811917305050)

The HTML <cite> seems to be often used for non-scientific links, e.g. https://github.com/jackwillis/militanthistory/blob/d942757eacefb720e6eb855b12486affa22e2435/collections/_encyclopedia/lm/newspaper.md:

<cite>Labor Militant</cite>

or https://github.com/leonp/leonp.github.io/blob/a694f9062341e6f38ed989bdef8645b7b6397ed5/journal/thinking/_posts/2014-11-22-turner.md

<cite>Mr Turner</cite>
de-code commented 4 years ago

For R:

https://github.com/M-E-Rademaker/cSEM/blob/c0a6353329eee2c62c494e95229906a742dbc3d4/dev/gsca_gscam_description.R

#' in \insertCite{Hwang2014;textual}{cSEM}, p. 75, the authors set '.iter_max'

or https://github.com/cran/equivalenceTest/blob/aae9bb4ee112a8fd7b69dc719e726b82c9ffd16a/R/equivalenceTest.R

# The first is discussed by \insertCite{tsong2017development;textual}{equivalenceTest} and the second by \insertCite{weng2018improved;textual}{equivalenceTest}.

https://github.com/stephens999/multivariate/blob/37eaf8365ea1214e06d8beb7404a5b1da7dfb076/globallipids/GLC.R

\cite{huang:2007}

or https://github.com/cran/JADE/blob/f279950fad445efbbdf3317413ce55454c329904/man/FOBI.Rd


\references{
\cite{Cardoso, J.-F. (1989), Source separation using higher order moments, in Proceedings of {IEEE} International Conference
on Accoustics, Speech and Signal Processing, 2109--2112.}
 
\cite{Miettinen, J., Taskinen S., Nordhausen, K. and Oja, H. (2015), Fourth Moments and Independent

or https://github.com/cran/metap/blob/0aad6738aa16c9950f779a080c88588413f6bb8e/man/sumlog.Rd

of studies \insertCite{fisher25}{metap}.
\insertNoCite{becker94}{metap}
\insertNoCite{rosenthal78}{metap}
\insertNoCite{sutton00}{metap}

or https://github.com/jeetsukumaran/2015-SSB-AnnArbor-Biogeography/blob/07f867ac77c00af8f5476f45ad2a81d651b7e77a/biogeobears/libexec/BioGeoBEARS_generics_v1.R