Open sorenrasmussenai opened 4 years ago
It appears to me that shuffle-bn has no effect, when run on a single GPU.
Example:
import torch import torch.nn as nn (B,C,H,W) = 4,3,2,2 model1 = nn.Sequential(nn.BatchNorm2d(C)) model2 = nn.Sequential(nn.BatchNorm2d(C)) print("Before:") print(" model1 stats: ", model1[0].running_mean, model1[0].running_var) print(" model2 stats: ", model2[0].running_mean, model2[0].running_var) shuffle_ids = torch.randperm(B).long() x1 = torch.randn(B,C,H,W)*3+1 x2 = x1[shuffle_ids] model1(x1) model2(x2) print("After:") print(" model1 stats: ", model1[0].running_mean, model1[0].running_var) print(" model2 stats: ", model2[0].running_mean, model2[0].running_var)
Before: model1 stats: tensor([0., 0., 0.]) tensor([1., 1., 1.]) model2 stats: tensor([0., 0., 0.]) tensor([1., 1., 1.]) After: model1 stats: tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332]) model2 stats: tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])
I guess another approach is necessary on single-GPU. Any thoughts?
Thanks for releasing this code.
The simplest solution would probably be to emulate the multi-gpu implementation in single GPU:
It appears to me that shuffle-bn has no effect, when run on a single GPU.
Example:
I guess another approach is necessary on single-GPU. Any thoughts?
Thanks for releasing this code.