google-research-datasets / clang8

cLang-8 is a dataset for grammatical error correction.
100 stars 5 forks source link

Model finetuning #5

Closed YovaKem closed 2 years ago

YovaKem commented 2 years ago

Can you clarify if you fine-tune one model on all 6 datasets listed in Table 1 in the paper, or if fine-tuning is done separately per language resulting in four models? If the former is the case, do you use any special mixing/balancing strategy to account for the disparity in data size per language? Thanks.

ekQ commented 2 years ago

Fine-tuning is done separately per language (but the GEC pre-training is done for 101 languages simultaneously).

YovaKem commented 2 years ago

Thanks!