allenai / unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System
https://arxiv.org/abs/2005.00700
Apache License 2.0
426 stars 43 forks source link

Could I have your processed data? #12

Closed Sanqiang closed 3 years ago

Sanqiang commented 3 years ago

I am trying to re-train your work with a much bigger model. Wondering if possible you can share the processed data in your GCP bucket?

iamjanvijay commented 2 years ago

@danyaljj I was having a look at the GCP bucket you've shared for the pre-processed dataset. And, for some datasets I see quite a few folder variants, like, _dbert, _dbidaf, _dev, _droberta, _v1, _v2, _v3, _l, _m, _s. What do these variants mean? Also, what exact folders have you used for training/evaluating the unified-qa-v2?

danyaljj commented 2 years ago

Also, what exact folders have you used for training/evaluating the unified-qa-v2?

https://github.com/allenai/unifiedqa/blob/master/tasks_v2.py#L189-L215

for some datasets I see quite a few folder variants, like, _dbert, _dbidaf, _dev, _droberta, _v1, _v2, _v3, _l, _m, _s

You have to be more specific. In many cases, you can figure out the meaning by referring to the dataset paper.

iamjanvijay commented 2 years ago

@danyaljj Thank you for your prompt response. The script line you've mentioned seems to include only the train-set (20 datasets) of UnifiedQA-v2 model. Are the datasets used for evaluation included here? https://github.com/allenai/unifiedqa/blob/master/tasks_v2.py#L117 Also, just to clarify, you mean to say that the information about these variants is included in the respective dataset papers.