The idea of the analog device in the document

IBM / aihwkit

IBM Analog Hardware Acceleration Kit

https://aihwkit.readthedocs.io

Apache License 2.0

353 stars 145 forks source link

The idea of the analog device in the document #231

Closed garystone1 closed 2 years ago

garystone1 commented 3 years ago

Hi, I am trying to understand the document of the this toolkit. I was confused by the concept of the pulsed devices and the unit cell devices. Some of the parameter in the functions are non-explained nor mentioned in the document. I was wondering is there anything I can look up to understand these devices? Thx!

maljoras commented 3 years ago

Hi @garystone1, many thanks for your question!

A pulse device is a device (think of it as a weight element) that implements a pulsed update response curve. In the default setting, a pulsed device will use stochastic pulse trains to update the analog devices in parallel, very similar to what is described in Gokmen & Vlasov, Front. in. Neurosc. (2016). The exact update behavior is given by the UpdateParameters (see here) but also depends on the device class itself. All available devices are described in detail here.

The unit cell devices are an abstract device class where one can combine multiple resistive elements per cross point and the resulting overall weight is a (weighted) sum of all conductances (in normalized units). For update, there are many options of how the single conductances of the unit cell can be updated. In the simplest case, they are just updated with identical pulses (VectorUnitCell, see here), or a single device within the unit cell is selected randomly (VectorUnitCell with different update policy) or one is updated and then the second is receiving read-out information of the first (TransferCompound, see here ).

I am not sure what you are referring to when you are saying many things are not explained in the documentation. I wonder, have you looked into the API documentation? We usually make sure all parameters and settings are explained and are documented. For instance, you can take a look at the api documentation of the devices, where each class, method, and attribute are explained with a doc string. Would be great if you could indicate more precisely the functions and parameters that you find not to be documented well enough, then we would know where to improve the documentation.

Many thanks!

garystone1 commented 3 years ago

Thank you for your response! I've read the API documentation. I am trying to understand the relation between the function in the documentation and the property of the real devices. To be more specific about my question,

Is the decay property in pulsed device representing the weight decay in the neural network that tries to prevent the over-fitting? What is the diffusion property trying to simulate in the aspect of the resistive device characteristic in the real world? What is ξ (the standard gaussian number) stand for?
βij is the parameter for the directional up versus down bias. Does it mean the difference between W+ij and W-ij?
What is the difference between softbound device and the linearstep device? What is the meaning of softbound?
Does the parameters like dw_min_std, dw_min_dtod and dw_min, all measured from the devices I want to simulate? If not, how to set those parameter of my own device?

maljoras commented 3 years ago

Hi @garystone1, thanks for these detailed questions, some of those might indeed not be well enough documented and we will try to improve the documentation. Let me answer it here first:

Decay is a decay of the weight value towards zero, similar to the typical L2 regularization (however, usually not multiplied with the learning rate, just a constant decay), applied once per mini-batch. However, indeed, most of the typical device materials will not have this property so it is usually set to zero. However, for some devices it becomes relevant, for instance for a capacitor (CapacitorPresetDevice) will have some leakage. The decay rate can be set to model this together with device-to-device variation in the decay rate. Diffusion is also typically set to zero, but some devices might diffuse (adding a Gaussian random number to the weight each mini-batch, which is called \xi), so one can set this number to see how training is effected by such a temporal disturbance.
The directional bias is used for a bi-directional device where an up and a down pulse might have different size on average. Sometimes one might also want to abstract away two uni-directional devices and model it as an effective bi-directional device. In that case the bias would be the systematic difference in the step size between the two devices (for instance for ConstantStepDevice).
The underlying model of the SofBoundsDevice is indeed identical to the LinearStepDevice. Only that the LinearStepDevice is more general as one can also define additional hard-bounds. See also issues #181 and #170 for more discussions.
Yes, usually these parameters are measured from your device at hand (see for instance our device presets that are based on measured devices; see here). Or one could vary these parameters and ask the question what the effect these parameters would have on the accuracy.

Hope that helps!

garystone1 commented 3 years ago

Thanks, the responses do help a lot. However, I still have some questions. For ConstantStepDevice :

Can I think βij as the subtract of two uni-directional devices' update step, the delta w d ij as the behavior of the bidirectional device, and the equation below is the transformation process between the uni-directional device and the bidirectional device?
dw_min is the mean of the minimal update step sizes across devices and directions. Isn't update step size a constant? What is the meaning of minimal? Is it the minimal update step size between the positive and negative device?
Also can I think the bminij and bmaxij as the weight bound of the devices for the positive value and the negative value?

maljoras commented 3 years ago

Indeed, the bias can be seen as the average difference in the expected weight change in response to a single pulse of two implicit uni-directional devices (averaged across all devices and all update pulses). Delta w^d_{ij} is the average change of the weight in response to a single voltage pulse of the given device at cross point ij in direction d (up or down). The equation is more concerned with what variability of this finite pulse response size is seen in both directions and across devices of a crossbar array.
Minimal here refers to the smallest increase/decrease of the weight value for a single pulse. Note that for each update multiple pulses can be used so that the weight increments (on average) are multiple of dw_min. This means if SGD demands for a even smaller weight change, this cannot be obtained by giving a pulse. In that sense it is the minimal change of the weight that can be "written" to the weight for a single update. The actual weight change in response to a single pulse is not constant, as there are cycle-to-cycle variations as well (dw_min_std), which mean that the exact size changes from pulse to pulse. However, on average it is Delta w^d_{ij}. If one additional averages over devices and directions, the average size is given by dw_min. Note that dw_min=(|dw_min_up| + |dw_min_down|)/2 according to the equation you posted (assuming sigma_d-to-d=0).
correct, these are weight bounds where the weight is clipped (hard bounds).

garystone1 commented 3 years ago

Thanks for the responses, I think I understand the concept of the constant step device. But I have some other question about other device. For linear step device, Is gamma up and gamma down the positive's and negative device's resistance since the linear step device has the update step linearly dependent with resistance. If so, how can the gamma up and gamma down constant? The conductance of the device that represent the weight value, is the reciprocal of the resistance. How can the gamma be constant, when the device can represent a range of weight?

For exp step device, how do you come up with the equation below? The additional parameters in the equation such as a, b, A_up, A_down, how can we determine the value of these parameters?

maljoras commented 3 years ago

For LinearStep the step size is linearly dependent with conductance value (which is assumed linearly mapped to the weight value) not with resistance. The slope of the change of the step size with conductance value is given by gamma. The slope is constant, but the resulting value of the conductance change depends on the current conductance. Note that for LinearStep we usually model bi-directional devices such as ReRAM, which thus has no separate positive and negative device. Instead it might require a (constant reference device that sets the 0 point).

For the ExpStepDevice this is the equation for the scale of the weight change size for one pulse given the current weight. It has an exponential dependence on the conductance, as explained here. The gamma (which is different from the gamma ion the equation of the LinStepDevice) and zij are given by the other equations as explained in the link. The additional parameters can be used to fit the model to a response curve.

We do not provide a fitting tool to automatically fit the devices to existing data (we might do so in future). But you could use the plotting function to examine the effect of the parameters, e.g.

import matplotlib.pyplot as plt
from aihwkit.utils.visualization import plot_device          
from aihwkit.simulator.presets.devices import ExpStepDevice

plt.ion()
plot_device(ExpStepDevice(A_up= 0.0005, a=0.3))
plt.show()

garystone1 commented 3 years ago

Thanks for the reply, From the previous responses from you, I knew that the constant step device can be use to simulate two uni-directional devices and the linear step device is for the bi-directional devices. How do I choose which device to use to simulate my own device? Do I determine it by observing the response function of the device and choose the one that might fit the properties I observed? Or is there any sample device for each class that can help me making the decision? Such as the ReRAM is often simulated by the linear step device.

I am quite curious about the concept of reference device. Why do we need a reference device that are set to 0 point? Can't we just record the value at 0 and use it in the device?

maljoras commented 2 years ago

Hi @garystone1 , I think I have missed you question, sorry about that. The way to fit your own device is to first observe whether the dG versus G response curve can be fitted by an Constant/Linear/Exp/Pow function. These functions are implemented as the *StepDevices. If you e.g. decide that a linear curve would fit it, then you can use the LinearStepDevice and then set its parameters (e.g. slope, variability, etc) to match your device. We don't have a fitting tool at the moment (see #291) but you can use plot_device to visualize the pulse response curve.

Regarding the reference device, yes, usually the reference device is implicitly modelled by using normalized conductance units which can then be negative. However, for some experiments, one might want to look at the accuracy drop if the reference device fluctuates or decays. For these cases, there is am abstract device that models the reference device explicitly.