Closed Sanqiang closed 3 years ago
@danyaljj I was having a look at the GCP bucket you've shared for the pre-processed dataset. And, for some datasets I see quite a few folder variants, like, _dbert, _dbidaf, _dev, _droberta, _v1, _v2, _v3, _l, _m, _s. What do these variants mean? Also, what exact folders have you used for training/evaluating the unified-qa-v2?
Also, what exact folders have you used for training/evaluating the unified-qa-v2?
https://github.com/allenai/unifiedqa/blob/master/tasks_v2.py#L189-L215
for some datasets I see quite a few folder variants, like, _dbert, _dbidaf, _dev, _droberta, _v1, _v2, _v3, _l, _m, _s
You have to be more specific. In many cases, you can figure out the meaning by referring to the dataset paper.
@danyaljj Thank you for your prompt response. The script line you've mentioned seems to include only the train-set (20 datasets) of UnifiedQA-v2 model. Are the datasets used for evaluation included here? https://github.com/allenai/unifiedqa/blob/master/tasks_v2.py#L117 Also, just to clarify, you mean to say that the information about these variants is included in the respective dataset papers.
I am trying to re-train your work with a much bigger model. Wondering if possible you can share the processed data in your GCP bucket?