IBM / aihwkit

IBM Analog Hardware Acceleration Kit
https://aihwkit.readthedocs.io
Apache License 2.0
335 stars 142 forks source link

Low accuracy of PCM preset #236

Closed saba-er closed 3 years ago

saba-er commented 3 years ago

When using PCM preset for training and testing a 3-layer FC network on FashionMNIST, I get a significantly low accuracy (<60%) whereas, for the similar network setting and training parameters configuration, I get a much higher accuracy for EcRAM and ReRAM presets (~85%). The EcRAM and ReRAM experiments were performed using the web-based AIHW Composer whereas the PCM experiment was performed using the local runner. Do these results make sense? I appreciate any feedback.

Network definition.

INPUT_SIZE = 784 HIDDEN_SIZES = [256, 128] OUTPUT_SIZE = 10

Training parameters.

EPOCHS = 30 BATCH_SIZE = 8 LEARNING_RATE = 0.01

maljoras commented 3 years ago

Hi @SaBaKa2020 , many thanks for the question!

Indeed, the PCM preset is by far the most challenging preset we have so far: It has only roughly 50 states, extremely large variability and is uni-directional, which means that it needs to refresh (open-loop read write), adding another large source of variability (see here for more info about the refresh). You can also run example 10 where the pulse response curves of the different presets are plotted, to get an intuition about their noisiness.

Note that when using the standard presets (those not starting with either TikiTaka.. or MixedPrecision..), the toolkit uses a parallel update scheme directly onto analog, which is quite challenging. For PCM, this mode is not going to be successful. For analog training with PCMs, the rank-update needs to be done in digital and only the forward/backward pass in analog. For this update scheme, we have the MixedPrecisionPCMPreset which implements exactly that (assuming more digital compute and thus potential runtime impacts, see here ).

In contrast to PCM, EcRAM and ReRAM have much more states and are bi-directional, so that they perform better on a direct update in analog (ReRam is also very noisy but most of this noise is write noise, and the hidden state is nevertheless quite reliable).

saba-er commented 3 years ago

Great clarifications! I can confirm that the experiment with MixedPrecisionPCMPreset achieves very good accuracy (~ 90%). Thanks @maljoras !

saba-er commented 3 years ago

@maljoras for the similar neural network architecture (3-layer fully connected) and training parameter setting, the mixed precision training achieves accuracies larger than the digital (floating point) accuracy on Fashion MNIST (88.17%, according to reference accuracy reported in AIHW Composer). Is this a bug or could it be "digital" means different than floating point?

maljoras commented 3 years ago

There is no bug. FP in the toolkit means conventional SGD (no momentum). One needs to optimize the hyperparamter, like learning rate etc. to get best accuracy. PCM with mixed precision might be better here out of the box because the noise regularizes and mixed precision is similar to a momentum SGD method.

saba-er commented 3 years ago

Thanks @maljoras! If I may ask one more question here, a rather different one? You might have already thought of Flash memory as a candidate NVM for neuromorphic computing. Would it be feasible to have a preset for Flash memory device (NAND/NOR) in this simulator (like the presets you have already have implemented for memristors)?

maljoras commented 3 years ago

Thank you, @SaBaKa2020! I am not familiar with Flash memory myself, but it could very well be that it is decently modeled with one of our existing device models. If so, if one had the pulse response curve of such a device, it would be easy to add it to the presets. Maybe you could point me to a reference publication where the response of such a flash memory is measured?
Thanks!

saba-er commented 3 years ago

@maljoras Thank you once again for addressing my questions. I appreciate it. This paper has very good details about the behavior of Flash memory device https://www.mdpi.com/2079-9292/10/3/337/htm Please let me know if you need more details. I am not expert in this domain, but I may be able to direct you to some experts in the area of Flash memory devices, and particularly those expert we have been in contact with. It would be really exciting to see a preset representing Flash memory device supported by this simulator :-) Thanks again!

maljoras commented 3 years ago

Hi @SaBaKa2020 , It seems that we do not have immediate plans to add a flash memory preset because it is in general problematic to support training on such a device (endurance issues, slow programming). However, are might be new flash developments that I am not aware of. If so, you are welcome to open a separate issue on that and we would be glad if you (or someone else with knowledge of flash memory) would be interested in contributing an adequate preset.