The drugprot dataset loading script drugprot.py currently only includes train and validation split. However, original dataset also provides _testbackground split (see also link below). The _testbackground split consists of 750 abstracts of test set and 10000 abstracts of background set.
The drugprot dataset loading script
drugprot.py
currently only includes train and validation split. However, original dataset also provides _testbackground split (see also link below). The _testbackground split consists of 750 abstracts of test set and 10000 abstracts of background set.https://biocreative.bioinformatics.udel.edu/media/store/files/2021/Track1_pos_1_BC7_overview.pdf
For this reason, I adjusted the data loading script
drugprot.py
to include the _testbackground split. The related pull request can be found here: https://github.com/bigscience-workshop/biomedical/pull/928