bigscience-workshop / xmtf

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786
Apache License 2.0
507 stars 37 forks source link

how to repreduce bloomz-* #18

Closed fpcsong closed 1 year ago

fpcsong commented 1 year ago

Thank you for contributing such an excellent work. I notice that bloomz- outperform bloom- via instruct tuning, I want to build a new bloomz-* model upon bloom model, (e.g. bloom-1b7-> bloomz-1b7-mt), but after finetuning bloom-1b7 model on some instruct data from xp3mt, the performance drops much. I use a batch size of 2048 and learn rate of 2e-5, and labels on inputs are masked. what else do i need to pay attention to? Or if there are some scripts to do this?

Muennighoff commented 1 year ago

The script for bloomz-1b7 is here: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr13- mtf/smaller_models/tr13b-1b3-ml-xp3capmixnewcodelonglossseq-a100.slurm

It has some important parts like computing loss only over targets & normalizing the loss.

fpcsong commented 1 year ago

Many thanks for your so quick response, I have done for computing loss only over targets & normalizing the loss, where is finetune_t0.py?

Muennighoff commented 1 year ago

Oh you need to make sure you are on the branch t0loading i.e. https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/t0loading

fpcsong commented 1 year ago

Thanks a lot! :smile:

fpcsong commented 1 year ago

Sorry to bother you again :crying_cat_face: , I notice that bloomz only fintuned 1000~2000 steps, how is the training data selected?

Muennighoff commented 1 year ago

It's just random shuffling of xP3, however with the language percentages like in this file: https://github.com/bigscience-workshop/xmtf/blob/master/xp3capmixnewcodelong_train.txt