IDEA-XL / PRESTO

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
https://arxiv.org/abs/2406.13193
Apache License 2.0
18 stars 2 forks source link

Hardware constrain for Stage-2 #6

Open ajaysinh-x opened 2 hours ago

ajaysinh-x commented 2 hours ago

Hi @CiaoHe , Thanks for sharing great work, I was trying to recreate the results, you mention in paper, You have mention in paper they you have been doing "Full fine tuning on stage-2 ". I have tried to full fine-tune model on stage-2 on my 4 A100 40GB GPUs, but its seems they are not enough, can you provide what exact hardware requirements for training full fine-tuning the model ?

Also there are about 6.7B trainable parameters in stage-2, are we training all the parameters ? or train a subset of Parameters ?

Moreover have you tried to train Stage-2 on LoRA ? what are your thoughts on train stage-2 and stage-3 on LoRA ?

CiaoHe commented 2 hours ago
  1. In stage-2, we trained through full-finetuning on 8 A6000 GPUs (48GB). So for 4xA100 40GB is quite challenging.
  2. We've tried train using LoRA and did some ablation (refer to Figure 6 or Appendix C.1). If you don't need to maintain the model's ability on other general tasks (like reasoning, and knowledge evaluation), we suggest full-tuning.