Question about using info loss term and annealing

mnskim commented 3 years ago

Hello,

Thank you for a great code release! I'm currently trying to use the VMASK layer with BERT as an addition to my own code,

I've made the necessary changes to the huggingface transformers code, and have been trying to train, and I noticed that it seemed like you used a negative loss term for the infor_loss and annealed the beta from 1 down to a lower value. Because in my setting the model loss term is also positive, I multiplied the infor loss by -1 and changed annealing weight beta to grow, i.e. 0->1, would this change to the loss function work?

HanjieChen commented 3 years ago

Hi,

The infor_loss is negative because we want to minimize the negative entropy of mask variable (equal to maximizing the entropy H(R|x)). You could multiply the infor loss by -1 as long as you are maximizing the entropy term in your final objective.

Best, Hanjie

mnskim commented 3 years ago

Ah, I had misunderstood the original loss - Thank you for the clarification!!

HanjieChen commented 3 years ago

No problem!

UVa-NLP / VMASK

Question about using info loss term and annealing #1