Open pupu-chenyanyan opened 2 weeks ago
It depends on batch size, frame_num, img_width, activation_checkpointing, mixed_precision type, distributed_type, etc. For stage1, it needs 10-20GB. Stage2 typically needs 40-80GB
It depends on batch size, frame_num, img_width, activation_checkpointing, mixed_precision type, distributed_type, etc. For stage1, it needs 10-20GB. Stage2 typically needs 40-80GB
Thank you for your reply. When using the NVIDIA A800, how about the training days and GPU count following the default settings? Since I only have two A6000s, I would like to inquire about the details and evaluate the training possibility of experiments conducted on my devices. Thanks again!
It depends on batch size, frame_num, img_width, activation_checkpointing, mixed_precision type, distributed_type, etc. For stage1, it needs 10-20GB. Stage2 typically needs 40-80GB
Thank you for your reply. When using the NVIDIA A800, how about the training days and GPU count following the default settings? Since I only have two A6000s, I would like to inquire about the details and evaluate the training possibility of experiments conducted on my devices. Thanks again!
8 A800s cost ~a week. But customizing config files (reduce epoch, batch size, fp16, deepspeed zero2, num_frame=16, width=320) will reduce it to ~two days.
hi ,i also have some question about memory, i found it's not enough to train 32 frames use 40G memory, i want to ask that the memory cost ~, and it is possible to train 32 frames use 40 memory ?
Hello, thank you for your wonderful work, please ask how much GPU memory is needed for the first and second stages of training in this project? And the number of GPUs used and training days?