The network architecture weights alphas_network are initialized non zero even for connections that do not exist. For example in the first layer we only have two connections, but three weight are initialized. This will cause problems when taking the softmax over values that are always non zero even though the weights are not used.
Therefore one should change the values to zero for all connections that are not used and only consider the non zero values when calculating softmax. I would suggest to add some kind of masking to alphas_network paramaters.
The network architecture weights
alphas_network
are initialized non zero even for connections that do not exist. For example in the first layer we only have two connections, but three weight are initialized. This will cause problems when taking the softmax over values that are always non zero even though the weights are not used.Therefore one should change the values to zero for all connections that are not used and only consider the non zero values when calculating softmax. I would suggest to add some kind of masking to
alphas_network
paramaters.