Closed MuroriM closed 3 years ago
Adds Arxiv and Xsum datasets. Still testing the Arxiv dataset due to its enormous size, so in case there's an error in the code, I'll push some more commits to fix this.
The goal is to have all commits in by tomorrow morning
Adds the Pubmed QA dataset to cover the medical domain
Modifies: dataset/huggingface_datasets.py dataset/init.py
Nice work! Are all five datasets done and equipped with tests?
Added SUMMscreen dataset and QMsum datasets. These changes affect dataset/non_huggingface_datasets.py and dataset/init.py.
Made a few minor fixes in: huggingface_datasets.py - to correct the types of certain SummInstance variables tests/dataset_test.py - corrects to the correct implementation of checking whether SummInstance is a list or string