I have just read your paper, and I do not have enough GPU & coding ability for test my idea.
I hope you can try my idea if you think it is reasonable.
In your work, you re-register the pruned candidate networks' mean and standrad deviation of a convolution layer's output before the scale and bias term of a batch-normalization layer(you call it adaptive batch normalization).
In my understanding, it can only be used for a network with batch-normalization layer.
My idea is simple, but I think it can attach to any convolutional network.
Before pruning, you can do a pseudo batch-norm for your network.
For an original conv-layer's output (without following a batch-norm layer)
You can pseudo batch-norm your layers' output with that
Where is the statistic of the original network(before pruning).
For the pruned candicate network, I think you can just re-register the std and mean to the following equation:
Where is the statistic of the pruned network.
The reason of my adjustment is that. I think if the pruned model has same statistic value(mean & std) of the old model (like your work), then it may have the same result like your work, but it can use for a model without batch-norm layer.
Thank you for your extraordinary thinking!
Your theory looks right! Your idea looks like inserting BN-layer to a model w/o BN.
I will take the time to experiment with your idea.
I have just read your paper, and I do not have enough GPU & coding ability for test my idea. I hope you can try my idea if you think it is reasonable.
In your work, you re-register the pruned candidate networks' mean and standrad deviation of a convolution layer's output before the scale and bias term of a batch-normalization layer(you call it adaptive batch normalization). In my understanding, it can only be used for a network with batch-normalization layer.
My idea is simple, but I think it can attach to any convolutional network. Before pruning, you can do a pseudo batch-norm for your network.
For an original conv-layer's output (without following a batch-norm layer) You can pseudo batch-norm your layers' output with that
Where is the statistic of the original network(before pruning). For the pruned candicate network, I think you can just re-register the std and mean to the following equation:
Where is the statistic of the pruned network.
The reason of my adjustment is that. I think if the pruned model has same statistic value(mean & std) of the old model (like your work), then it may have the same result like your work, but it can use for a model without batch-norm layer.