IBM / aihwkit

IBM Analog Hardware Acceleration Kit
https://aihwkit.readthedocs.io
MIT License
365 stars 148 forks source link

Would there be any way of fixing the value of specific weights in nn.AnalogLinear? #150

Closed nkyungmi closed 3 years ago

nkyungmi commented 3 years ago

Hi! I'm trying to modify the values of specific weights in nn.AnalogLinear layer. I'm currently using MLP (784-256-128-10) and each layer is made up of nn.AnalogLinear I know how to access each weight values and initialize, but what I want to do is not only to modify the values but to fix the values for the entire training process, without effecting the gradients. Would there be any way of doing this using AnalogLinear?

(Sorry for not properly adding the label 'question', that I'm so new to this github system..!)

maljoras commented 3 years ago

Thank you very much for your question, @nkyungmi! If I understand your question correctly, then you do not want to train the analog weights at all, but maybe some other (digital) parts of the DNN?

In principle, one can just set the learning rate of a particular analog layer to zero, however, when using the AnalogSGD, the overall learning rate of the optimizer will be used instead.

We do not support different learning rates for analog and digital components in our AnalogSGD for now (we might add this in near future), but one quick (and dirty) way to achieve a similar thing is to not use the AnalogSGD but instead use one of pytorch's native optimizers (e.g. torch.optim.SGD) and also omit the parameter regrouping (ie delete this line https://github.com/IBM/aihwkit/blob/master/examples/01_simple_layer.py#L47). This should then fix the analog weights, as the optimizer is not aware of the analog features, but should still train other (digital) parts of the DNN. Note, however, that in this case any decay, diffusion or drift process is omitted as well, since that is usually handled in the AnalogSGD. Other non-linearities and noises of the forward and backward pass will still be present though.

In case that you do not want to train the network at all, but only compute the backward pass, then you could just omit the optimizer step altogether (i.e. delete the opt.step()).

maljoras commented 3 years ago

@nkyungmi I have to correct my answers above, it is actually already possible to specify a different learning rate for each layer, or turn off learning for all analog layer!

In pytorch, all layers have their own learning rate in fact. So you can indeed use our AnalogSGD and specify learning rates for each layer. For instance if you want to set all analog learning rates to zero, you can do (after you regroup the parameter as in the examples):

opt = AnalogSGD(model.parameters(), lr=0.1)
opt.regroup_param_groups(model)
for group in opt.param_groups:
    if group.get('analog_tile'):
        group['lr'] = 0.0

that would turn off all the analog tiles' learning. Similarly, if you only want to turn off a specific layer (e.g. let's say model[0] is the AnalogLayer you want to fix), you could do:

opt = AnalogSGD(model.parameters(), lr=0.1)
opt.regroup_param_groups(model)
for group in opt.param_groups:
    if group.get('analog_tile') == model[0].analog_tile:
        group['lr'] = 0.0

before the training. That should set only the LR of the first analog layer to 0, and thus fix the weight values.

nkyungmi commented 3 years ago

Thank you for your very detailed answer!! Yes, I was trying to train the network outside this analog tiles then put weight values back to the layer. I now got that I can stop training for some layers in the network by using the method you mentioned above. I'll try that. Thank you again :)!!