IBM / aihwkit

IBM Analog Hardware Acceleration Kit
https://aihwkit.readthedocs.io
MIT License
365 stars 148 forks source link

drift on non-backpropagation based algorithms #634

Open imanehmz opened 8 months ago

imanehmz commented 8 months ago

Description and motivation

I'm trying to make inference with different timesteps on a neural network that's trained with Feedback Alignment from biotorch, however it is showing the same accuracy for all timesteps. Here's the pseudo-code I'm using :

from biotorch.module.biomodule import BioModule
model = resnet18()
model = BioModule(module=model, mode='fa')
# <training loop on GPU>
rpu_config = create_rpu_config_new()
analog_model = convert_to_analog(model, rpu_config)
analog_optimizer = AnalogSGD(analog_model.parameters(), lr=0.01)
# training loop of analog_model 
analog_model.eval() 
t_inference_list = [1,60,3600,3600*24,3600*24*7]
test_inference(analog_model, F.cross_entropy, val_loader, t_inference_list)

I'm getting an equal accuracy after the drift for all timesteps, the feedback alignment training method doesn't involve the usage of gradients in the backward pass and uses a random feedback matrix, however, when evaluating the drift, we're in eval mode so it shouldn't be a problem and the noise is going to be applied on the weights of the model and not the gradients to my understanding, why is there no loss in accuracy with the drift?

Proposed solution

It would be great to add support for other types of training methods other than backpropagation drift on analog hardware

Here's the link to a notebook on colab to execute the code of an example

maljoras commented 8 months ago

It seems that you have done it correctly from looking at your code, just in general the drift time should be given as float not as int. I would think that the issue in your case is that the out_bound setting is much too restrictive (around 5). That would mean that only 5 inputs can be active with weights at gmax before it gets clipped. That is very restrictive. Note that the weights are rescaled to gmax when mapped, as a digital output scale is used in your setting

imanehmz commented 7 months ago

Thank you for the remark, I converted the times into floats, I increased the out_bound value but it didn't change the result, I tried other rpu_configs from the tutorials available on the aihwkit documentation and the drift is still not being applied to the weights so the issue should be coming from somewhere else

kaoutar55 commented 6 months ago

Can you try to use the rpu config that we have in the tutorial to check if the issue persists? link

kaoutar55 commented 5 months ago

@imanehmz, did you make the suggested changes to your hw-aware training configuration?

imanehmz commented 5 months ago

@kaoutar55 Hello, I've used many rpu_configs from the tutorials including that one, but the issue persists

kaoutar55 commented 3 months ago

@imanehmz can you please tell us how to reproduced this error. Please share your code or a notebook you tried.

imanehmz commented 3 months ago

Yes, here's the link to the notebook

kaoutar55 commented 3 months ago

It seems you are using an older version of the aihwkit release. can you use the latest? Look at this example to get the right command: https://github.com/IBM/aihwkit/blob/master/notebooks/tutorial/hw_aware_training.ipynb

imanehmz commented 3 months ago

I tried to use the new one by duplicating the notebook and seeing if there's a change, but there's still no drift applied on the weights, maybe because of the way the weights are stored in the model after converting it to be trained with feedback alignment with biotorch