I have an idea as an extension of your work.

I have just read your paper, and I do not have enough GPU & coding ability for test my idea. I hope you can try my idea if you think it is reasonable.

In your work, you re-register the pruned candidate networks' mean and standrad deviation of a convolution layer's output before the scale and bias term of a batch-normalization layer(you call it adaptive batch normalization). In my understanding, it can only be used for a network with batch-normalization layer.

My idea is simple, but I think it can attach to any convolutional network. Before pruning, you can do a pseudo batch-norm for your network.

For an original conv-layer's output $x$ (without following a batch-norm layer) You can pseudo batch-norm your layers' output with that

$x=\frac{x^o-\mu_{x}^{o}}{\sigma_{x}^{o}}\cdot\sigma_x^o+\mu_x^o$

Where $(\cdot)^o$ is the statistic of the original network(before pruning). For the pruned candicate network, I think you can just re-register the std and mean to the following equation:

$x=\frac{x^p-\mu_{x}^{p}}{\sigma_{x}^{p}}\cdot\sigma_x^o+\mu_x^o$

Where $(\cdot)^p$ is the statistic of the pruned network.

The reason of my adjustment is that. I think if the pruned model has same statistic value(mean & std) of the old model (like your work), then it may have the same result like your work, but it can use for a model without batch-norm layer.

anonymous47823493 / EagleEye

I have an idea as an extension of your work. #33