Closed mnskim closed 3 years ago
Hi,
The infor_loss is negative because we want to minimize the negative entropy of mask variable (equal to maximizing the entropy H(R|x)). You could multiply the infor loss by -1 as long as you are maximizing the entropy term in your final objective.
Best, Hanjie
Ah, I had misunderstood the original loss - Thank you for the clarification!!
No problem!
Hello,
Thank you for a great code release! I'm currently trying to use the VMASK layer with BERT as an addition to my own code,
I've made the necessary changes to the huggingface transformers code, and have been trying to train, and I noticed that it seemed like you used a negative loss term for the infor_loss and annealed the beta from 1 down to a lower value. Because in my setting the model loss term is also positive, I multiplied the infor loss by -1 and changed annealing weight beta to grow, i.e. 0->1, would this change to the loss function work?