[QUESTION] tensor_parallel.broadcast_data and train_valid_test_datasets_provider.is_distributed = True

In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus.

However, if i activate the option train_valid_test_datasets_provider.is_distributed = True while building dataloader, dataloader would be initialized on every gpus. And it seems they return same data on every iteration. Then what does tensor_parallel.broadcast_data do for in this case?

I am not sure that i understood the procedure of broadcasting data , so i would be very grateful if give me any information about this. Thanks.

NVIDIA / Megatron-LM

[QUESTION] tensor_parallel.broadcast_data and train_valid_test_datasets_provider.is_distributed = True #1125