The flags for the fully-supervised baselines choose mean_teacher for the consistency model, intended as a no-op because the consistency weight is set to zero. But the hparams for Mean Teacher dictate a different learning rate than the best one we found for fully-supervised (and also the one we state we used in the paper).
The flags for the fully-supervised baselines choose
mean_teacher
for the consistency model, intended as a no-op because the consistency weight is set to zero. But the hparams for Mean Teacher dictate a different learning rate than the best one we found for fully-supervised (and also the one we state we used in the paper).