Closed fpcsong closed 1 year ago
The script for bloomz-1b7 is here: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr13- mtf/smaller_models/tr13b-1b3-ml-xp3capmixnewcodelonglossseq-a100.slurm
It has some important parts like computing loss only over targets & normalizing the loss.
Many thanks for your so quick response, I have done for computing loss only over targets & normalizing the loss, where is finetune_t0.py?
Oh you need to make sure you are on the branch t0loading
i.e. https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/t0loading
Thanks a lot! :smile:
Sorry to bother you again :crying_cat_face: , I notice that bloomz only fintuned 1000~2000 steps, how is the training data selected?
It's just random shuffling of xP3, however with the language percentages like in this file: https://github.com/bigscience-workshop/xmtf/blob/master/xp3capmixnewcodelong_train.txt
Thank you for contributing such an excellent work. I notice that bloomz- outperform bloom- via instruct tuning, I want to build a new bloomz-* model upon bloom model, (e.g. bloom-1b7-> bloomz-1b7-mt), but after finetuning bloom-1b7 model on some instruct data from xp3mt, the performance drops much. I use a batch size of 2048 and learn rate of 2e-5, and labels on inputs are masked. what else do i need to pay attention to? Or if there are some scripts to do this?