Can you clarify if you fine-tune one model on all 6 datasets listed in Table 1 in the paper, or if fine-tuning is done separately per language resulting in four models? If the former is the case, do you use any special mixing/balancing strategy to account for the disparity in data size per language? Thanks.
Can you clarify if you fine-tune one model on all 6 datasets listed in Table 1 in the paper, or if fine-tuning is done separately per language resulting in four models? If the former is the case, do you use any special mixing/balancing strategy to account for the disparity in data size per language? Thanks.