Closed ds22058 closed 6 months ago
Hello. Does the "A," "B," and "C" mean three training stages? To train a 7B model on 8 A100 GPUs with 80GB memory, the first stage takes around 9 hours, the second stage takes around 30 hours, and the third stage takes around 12 hours.
Hi @wcy1122, I was wondering if you could share the wandb logs for each stage so we can compare loss curves for reproduction? Thanks!
Hi!How much time did your training on 8 A100 GPUs with 80GB memory for "A," "B," and "C" respectively take?