Hardware constrain for Stage-2

IDEA-XL / PRESTO

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Apache License 2.0

18 stars 2 forks source link

Hi @CiaoHe , Thanks for sharing great work, I was trying to recreate the results, you mention in paper, You have mention in paper they you have been doing "Full fine tuning on stage-2 ". I have tried to full fine-tune model on stage-2 on my 4 A100 40GB GPUs, but its seems they are not enough, can you provide what exact hardware requirements for training full fine-tuning the model ?

Also there are about 6.7B trainable parameters in stage-2, are we training all the parameters ? or train a subset of Parameters ?

Moreover have you tried to train Stage-2 on LoRA ? what are your thoughts on train stage-2 and stage-3 on LoRA ?

IDEA-XL / PRESTO

Hardware constrain for Stage-2 #6