google-research / mixmatch

Apache License 2.0
1.13k stars 163 forks source link

Reason for ramping up weight of unlabelled loss function(lambda_u). #31

Closed Shubhammawa closed 4 years ago

Shubhammawa commented 4 years ago

According to my understanding: The label guessing followed by sharpening can result in erroneous predictions in the beginning which means the unlabelled loss function is not useful during the first few iterations, but the labelled loss function will lead to meaningful updates in the model weights which will eventually lead to better guessed labels and thus the unlabelled loss function will also function become useful after a few iterations. Can we first use only the labelled loss function and then after training introduce the unlabelled loss function and train again?

Shubhammawa commented 4 years ago

Also, during training, do we update the guessed label after parsing through each batch, or after parsing through the whole dataset i.e after every epoch?

carlini commented 4 years ago
  1. We do have a warmup that starts at 0 (and only uses the labeled loss) and slowly grows to 1 (use the full unlabeled examples). This is a soft warmup though, not a hard one as you suggest.

https://github.com/google-research/mixmatch/blob/1011a1d51eaa9ca6f5dba02096a848d1fe3fc38e/mixmatch.py#L58

  1. The label guesses are ephemeral to each mini-batch, and consistently re-generated with the current model weights.
Shubhammawa commented 4 years ago

Thanks a lot for the reply.