Open Zhaoxian-Wu opened 7 months ago
Thanks for raising this issue. perfect_bias is indeed some "old" parameter setting, that should only be relevant for analog_bias. Since we have digital_bias now, it should actually be deleted. It has nothing to do with shared_weights so this is not relevant here.
I see. It seems that using digital_bias
instead is a more natural solution. But what does shared_weights do? Does that mean multiple tiles share the same torch array?
Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)
Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)
I see. Thanks for your kind explaination :D
@maljoras do we need to remove the perfect_bias in the code flow when we are using digital bias? what do you suggest here? It looks that we have a bug we need to solve.
I think this could be moved to a new issue @kaoutar55 , since the issue was opened because of a problem that finally seemed to be a concept bug, we can open a discussion about the perfect_bias if you like @maljoras and close this issue because actually the issue was solved, or at least that was my impression correct me if I'm wrong @Zhaoxian-Wu
@Zhaoxian-Wu please look at this and try it at your end with the suggested changes. Let us know if the issue is resolved.
Description
As discussed in #604, the model weights will sometimes fall outsize [w_min, w_max].
Bug Pinpoint
The bug happens because of the incorrect initialization of
w_min_bound_
andw_max_bound_
(see the code). It seems the following code snippet is designed to deal with the situation where the share weights is deployed andPulsedDevice.perfect_bias
is turned on. When thePulsedDevice.perfect_bias
is on, the last dimension of the weights is incorrectly amplified by 100 times, yielding the incorrect active regions and weights.TODO
I was trying to fit the bug directly, but I found that I couldn't control shared_weight through the
AnalogLinear
initialization. I guess we should design a flag here to better control the shared weight behavior.