bigscience-workshop / xmtf

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786
Apache License 2.0
507 stars 37 forks source link

Is mT0 suitable for continued training on span corruption task? #2

Closed junwang-wish closed 1 year ago

junwang-wish commented 1 year ago

Is mT0 suitable / recommended for continued training on mixture of denoising (span corruption, extreme span corruption, prefix LM) tasks similar to UL2? Like below

# span_corruption
{
"text_input": "The <extra_id_0> walks in <extra_id_1> park", 
"text_output": "<extra_id_0> cute dog <extra_id_1> the <extra_id_2>"
}

# extreme_span_corruption
{
"text_input": "The <extra_id_0> park", 
"text_output": "<extra_id_0> cute dog walks in the <extra_id_1>"
}

# prefix LM
{
"text_input": "The cute <extra_id_0>", 
"text_output": "<extra_id_0> dog walks in the park"
}

My domain text is quite different from internet text so I assume span corruption task would help mT0 learn special syntax / semantics of my domain.

sbmaruf commented 1 year ago

I think that BLOOM might be a good candidate for that. After UL2 training you might want to try instruction tuning like BLOOMZ, FLAN or T0. But a good workaround could be (i) include instruction tuning samples (xp3mt, p3 etc) in the "prefix LM" objective function, (ii) include other objective function like span_corruption and continue UL2 training.

junwang-wish commented 1 year ago

Thanks!