Open wuxuanttt opened 11 months ago
Hi, the training can vary greatly depending on the setup you use (e.g. number and size of GPUs, batch size and datasets). If you are referring to pre-training on our setup using 8 A100 80GB with batch size 2 per GPU it takes around 15 days.
Hi, the training can vary greatly depending on the setup you use (e.g. number and size of GPUs, batch size and datasets). If you are referring to pre-training on our setup using 8 A100 80GB with batch size 2 per GPU it takes around 15 days.