bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

Fix unit test to run local PRs + fix tutorial #850

Closed hakunanatasha closed 1 year ago

hakunanatasha commented 1 year ago

Enables unit testing of local scripts with --test_local flag; borrows the test_bigbio_hub.py script. I tested this by making a copy of scitail as test_scitail in the biodatasets folder, and

To replicate:

Note- the contributions guide makes a reference to the templates folder that has 2 scripts; bigbiohub and a template file that can be used to fill-in-the-blanks. To avoid a deprecated script, maybe we should automate that for every new update, bigbiohub is either copied into the template folder OR I can just change the tutorial to reflect it's actual "default" location of bigbio/hub/bigbiohub.py

TODO: on 2023/01/03 I'm going to add one more small change that also tests whether the METADATA is in the acceptable set of values to ensure standardization!

@galtay

galtay commented 1 year ago
  • copy the scitail folder in bigbio/biodatasets as cp -r bigbio/biodatasets/scitail bigbio/biodatasets/test_scitail.

I thought we are trying to test the data loader scripts in this directory? https://github.com/bigscience-workshop/biomedical/tree/main/bigbio/hub/hub_repos

hakunanatasha commented 1 year ago

Superceded by #856