Hi,
Following the instructions for both 'Unsupervised Training' and 'Linear Classification', I find different model parameters are initialized in each GPU worker. Because random seed is not set inside main_worker function.
For pytorch DistributedDataParallel, do you think initializing the same set of model parameters across all GPU workers could give more accurate gradient and better performance?
Thanks!
Hi, Following the instructions for both 'Unsupervised Training' and 'Linear Classification', I find different model parameters are initialized in each GPU worker. Because random seed is not set inside
main_worker
function. For pytorchDistributedDataParallel
, do you think initializing the same set of model parameters across all GPU workers could give more accurate gradient and better performance? Thanks!