Closes #927 - Githubissues

kai-car commented 1 month ago

Name: Drugprot
Description: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/
Paper: https://biocreative.bioinformatics.udel.edu/media/store/files/2021/Track1_pos_1_BC7_overview.pdf
Data: https://zenodo.org/record/5119892/files/drugprot-training-development-test-background.zip?download=1

Current version of drugprot.py only includes splits train and validation. For this reason, I adjusted the drugprot.py data loading script to also load the test_background split, as the .tsv files are already present in the data folder. Note that the test_background split does not have any relations.

See also HuggingFace pull request: https://huggingface.co/datasets/bigbio/drugprot/discussions/1/files

phlobo commented 1 month ago

Thank your for the PR! The loader script that needs to be adapted is the one under hub_repos though.

Please take a look at the contribution guide, where you can also find how to format the code and execute tests (currently, the test output doesn't reflect your changes).

kai-car commented 1 month ago

Hi, thanks for the feedback, I adjusted accordingly. This time, I properly followed the steps and now it should work. 👍

phlobo commented 1 month ago

Thank you for your changes!

I'm getting the following error running the unit tests:

AssertionError: Dataloader attribute 'Creative Commons Attribution 4.0 International' not valid for _LICENSE must be one of {'GPL_2p0_WITH_BISON_EXCEPTION', 'PDDL_1p0', ...}

It's not related to your fix, but could you please add the correct license key in your PR? I guess it should be CC_BY_4p0

Also, I still see some differences when running black, did you run the formatting (https://github.com/bigscience-workshop/biomedical/blob/main/CONTRIBUTING.md#5-format-your-code)?

kai-car commented 1 month ago

Hope the code adjustments fix the problems. :)

bigscience-workshop / biomedical

Closes #927 #928