A question about dataloader setting : `shufflu`?

facebookresearch / swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Other

1.99k stars 280 forks source link

A question about dataloader setting : `shufflu`? #103

Open Classmate-Huang opened 2 years ago

Classmate-Huang commented 2 years ago

Excellent work!

I see that your training script does not use the shuffle=True setting when loading data. I wonder if this setting has any effect for performance?

Does using shuffle=True have a positive effect? Or negative effects?

Yuxin-Du-Lab commented 1 year ago

Same question. Have you reached a conclusion? Thx

Yuxin-Du-Lab commented 1 year ago

Stick an explanation: Shuffle in the DistributedSampler is true(default). If you set shuffle in the DistributedSampler to true, you do not need to set shuffle in the DataLoader that uses the sampler, because the DistributedSampler generates different random seeds for each process in a distributed environment to mess up the data. Therefore, in a distributed environment, it is recommended to set the shuffle only in the DistributedSampler.