Canola - english-python dataset from webcrawled data
MMLU-PRO - a benchmark of difficult multple choice questions
Also created
get_json_dataset.py - for getting json datasets automatically from a url
get_parquet_dataset.py - for getting parquet datasets automatically from a url
And a helper script: get_dataset.sh from the template folder with prefilled fields for easily getting new datasets.
Adding scripts compatible with:
Also created
get_json_dataset.py
- for getting json datasets automatically from a urlget_parquet_dataset.py
- for getting parquet datasets automatically from a urlAnd a helper script:
get_dataset.sh
from the template folder with prefilled fields for easily getting new datasets.