BaderLab / saber

Saber is a deep-learning based tool for information extraction in the biomedical domain. Pull requests are welcome! Note: this is a work in progress. Many things are broken, and the codebase is not stable.
https://baderlab.github.io/saber/
MIT License
102 stars 17 forks source link

saber.load_dataset() should be able to pull from pubannotation. #146

Open JohnGiorgi opened 5 years ago

JohnGiorgi commented 5 years ago

Saber.load_dataset() should be able to pull from pubannotation.org given a projects URL.

E.g.

saber.load_dataset('http://pubannotation.org/projects/AGAC_training/annotations.tgz')

should download the dataset to ~/saber/datasets, convert it to the CoNLL 2003 format, and load it into a Dataset object. Furthermore, if this URL is ever supplied again, load_dataset() should use the cached version of the dataset in ~/saber/datasets.

Considering pubannotation.org contains most of the most popular datasets for BioNLP, this would nearly eliminate the need to maintain datasets locally.