Hi @CiaoHe ,
Thanks for sharing great work, I was trying to recreate the results, you mention in paper, You have mention in paper they you have been doing "Full fine tuning on stage-2 ". I have tried to full fine-tune model on stage-2 on my 4 A100 40GB GPUs, but its seems they are not enough, can you provide what exact hardware requirements for training full fine-tuning the model ?
Also there are about 6.7B trainable parameters in stage-2, are we training all the parameters ? or train a subset of Parameters ?
Moreover have you tried to train Stage-2 on LoRA ? what are your thoughts on train stage-2 and stage-3 on LoRA ?
In stage-2, we trained through full-finetuning on 8 A6000 GPUs (48GB). So for 4xA100 40GB is quite challenging.
We've tried train using LoRA and did some ablation (refer to Figure 6 or Appendix C.1). If you don't need to maintain the model's ability on other general tasks (like reasoning, and knowledge evaluation), we suggest full-tuning.
Hi @CiaoHe , Thanks for sharing great work, I was trying to recreate the results, you mention in paper, You have mention in paper they you have been doing "Full fine tuning on stage-2 ". I have tried to full fine-tune model on stage-2 on my 4 A100 40GB GPUs, but its seems they are not enough, can you provide what exact hardware requirements for training full fine-tuning the model ?
Also there are about 6.7B trainable parameters in stage-2, are we training all the parameters ? or train a subset of Parameters ?
Moreover have you tried to train Stage-2 on LoRA ? what are your thoughts on train stage-2 and stage-3 on LoRA ?