Srijith-rkr / Whispering-LLaMA

EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
MIT License
232 stars 16 forks source link

is ddp valid in training code? #12

Closed yangdongdong2000 closed 1 month ago

yangdongdong2000 commented 1 month ago

i want to ask whether DDP stragegy is valid in training code. In train, the function save_model_checkpoint seemed only save model that global_rank equals to 0. The training code is like training two model parallelly using different data, but only save the first model when using two gpus

Srijith-rkr commented 1 month ago

DDP replicates the model across all GPUs, but during the backward pass, the gradients are synchronized across all these copies, ensuring that the models remain identical. Since all model copies are synchronized and identical, saving the model on only global_rank == 0 avoids saving redundant checkpoints from other ranks and is valid.