Reason for ramping up weight of unlabelled loss function(lambda_u).

Shubhammawa commented 4 years ago

According to my understanding: The label guessing followed by sharpening can result in erroneous predictions in the beginning which means the unlabelled loss function is not useful during the first few iterations, but the labelled loss function will lead to meaningful updates in the model weights which will eventually lead to better guessed labels and thus the unlabelled loss function will also function become useful after a few iterations. Can we first use only the labelled loss function and then after training introduce the unlabelled loss function and train again?

Shubhammawa commented 4 years ago

Also, during training, do we update the guessed label after parsing through each batch, or after parsing through the whole dataset i.e after every epoch?

carlini commented 4 years ago

We do have a warmup that starts at 0 (and only uses the labeled loss) and slowly grows to 1 (use the full unlabeled examples). This is a soft warmup though, not a hard one as you suggest.

https://github.com/google-research/mixmatch/blob/1011a1d51eaa9ca6f5dba02096a848d1fe3fc38e/mixmatch.py#L58

The label guesses are ephemeral to each mini-batch, and consistently re-generated with the current model weights.

Shubhammawa commented 4 years ago

Thanks a lot for the reply.

google-research / mixmatch

Reason for ramping up weight of unlabelled loss function(lambda_u). #31