In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus.
However, if i activate the option train_valid_test_datasets_provider.is_distributed = True while building dataloader, dataloader would be initialized on every gpus.
And it seems they return same data on every iteration. Then what does tensor_parallel.broadcast_data do for in this case?
I am not sure that i understood the procedure of broadcasting data , so i would be very grateful if give me any information about this.
Thanks.
In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus.
However, if i activate the option
train_valid_test_datasets_provider.is_distributed = True
while building dataloader, dataloader would be initialized on every gpus. And it seems they return same data on every iteration. Then what doestensor_parallel.broadcast_data
do for in this case?I am not sure that i understood the procedure of broadcasting data , so i would be very grateful if give me any information about this. Thanks.