bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
439 stars 111 forks source link

Unittests tests.test_bigbio_hub bioasq_task_b are failing #924

Open mart1nro opened 2 weeks ago

mart1nro commented 2 weeks ago

Describe the bug

"runTest (main.TestDataLoader) [Check multiple choice]" is failing. This is because "choices" is always set to [] for each record in _generate_examples, even for yesno questions: https://github.com/bigscience-workshop/biomedical/blob/5c9e606097844db49ca2f7151e2a349f67c0d2cd/bigbio/hub/hub_repos/bioasq_task_b/bioasq_task_b.py#L791

Tests for bioasq10b are failing with datasets.exceptions.DatasetGenerationError because there is no "BioASQ-training10b" folder in "BioASQ-training10b.zip" and the path "BioASQ-training10b/training10b.json" is incorrect: https://github.com/bigscience-workshop/biomedical/blob/5c9e606097844db49ca2f7151e2a349f67c0d2cd/bigbio/hub/hub_repos/bioasq_task_b/bioasq_task_b.py#L698

The correct path should be "training10b.json", similar to the bioasq_8b example.

Steps to reproduce the bug

Download dataset zips of task b from http://participants-area.bioasq.org/datasets/ and put them into a data_dir

python -m tests.test_bigbio_hub bioasq_task_b --data_dir /home/robert/Desktop/bioasq --test_local

Expected results

All tests pass.

Actual results

See attached test_output.txt test_output.txt

Environment info