Open qingfengmingyue opened 3 years ago
That's expected. This is because when there is only one worker there is no need to do any gradient exchanging.
What kind of configuration is efficient? Can different workers have different numbers of GPUs?
What kind of configuration is efficient? Can different workers have different numbers of GPUs?
2worker and 2 server