ImperialCollegeLondon / R2T2

Research References Tracking Tool
MIT License
14 stars 155 forks source link

[Discussion] Source files and processing references #51

Open dalonsoa opened 4 years ago

dalonsoa commented 4 years ago

The main use case of R2T2 - at least in my mind - is to annotate libraries and then retrieve the references used when running a script that uses those libraries.

Under this circumstances, two comments come to my mind:

1) It is important to note that each library should have its own references source file - at least one - and therefore it cannot be indicated by the user when running R2T2. If (let's dream) numpy, scipy and scikit-learn adopt R2T2 and I run a script using them, when processing the used references, each should be look for the information in the corresponding source file.

So, for this to work, we have to enable a way of adding references source files to the BIBLIOGRAPHY object, something like BIBLIOGRPAHY.add_source(path_to_source) in the __init__.py of the library. Then we can probably use the inspect to figure out what's the library adding the source.

2) As @ChasNelson1990 has pointed out in #32, loading the bibtex file each time a bibtex key needs to be processed is expensive and makes no sense. So, whenever a reference is processed, appart for looking for the full reference in the correct source, the loaded source should be cached, so processing further references for the same library does not incur in extra i/o operations.

In summary:

ChasNelson1990 commented 4 years ago

Comments on 1:

dalonsoa commented 4 years ago
ChasNelson1990 commented 4 years ago

I thought new-style modules didn't require __init__.py files anymore?

Also, there's a difference between Python enforcing a language standard and us... Personally, I just think that enforcing things never does well... however! maybe I'm just being pessimistic and we should do it and see if anybody complains.

Fairpoint that the toml doesn't come down when we install!

ChasNelson1990 commented 4 years ago

*although I can't find any evidence for my first point right now...

dalonsoa commented 4 years ago

Ok, let's ignore where to put it. Do we agree we need a way for each package to indicate where their reference source is? Does something like BIBLIOGRPAHY.add_source(path_to_source) called somewhere within the library code looks sensible?

ChasNelson1990 commented 4 years ago

I'd be happy with that. It was my original plan when I started yesterday but I decided to use cli-parameters to keep things inline with what was already being used. :-)

dalonsoa commented 4 years ago

Yep, I guess things evolve with time and the complexity of the code. The cli is still neded for users to define in what output format they want the references list, but the input of those references is up to the package using R2T2.

ChasNelson1990 commented 4 years ago

Yea, that makes sense.