The problem about Hard Shrinkage operation

sjp611 commented 4 years ago

In this paper, we can get 0 value by ReLU Activation in the hard shrinkage operation (Equation 7). The result is used in Equation 9 to minimize entropy of w^. When we minimize entropy(Eq 9), 0 value can be used at logarithm. The result of log(0) is -inf.

How did you solve this problem?

fluowhy commented 4 years ago

Hi sjp611, a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

Zk-soda commented 4 years ago

Hi sjp611, a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

LiUzHiAn commented 4 years ago

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

fluowhy commented 4 years ago

Hi sjp611, a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p.

sjp611 commented 4 years ago

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Wolfybox commented 4 years ago

Hi sjp611, a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

nah. Adding ''1' to all zero items will yield an issue at backward when training.

LiUzHiAn commented 4 years ago

Hi, I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this? Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

fluowhy commented 4 years ago

Hi, I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this? Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing. It is really difficult to know exactly what your problem is, but I recommend you to check http://karpathy.github.io/2019/04/25/recipe/ . Please do not consider it a recipe but a source of empirical tips and tricks to debug your model.

donggong1 / memae-anomaly-detection

The problem about Hard Shrinkage operation #10