Closed hkunzhe closed 3 years ago
Let's talk about this issue. FixMatch uses WRN-28-2 (about 1.5 M params), and 960 samples at each iteration. (64 labeled data, 64x7 unlabeled data with weak augmentations, 64x7 unlabeled data with strong augmentations).
What is the number of total mini-batch of MixMatch per iteration?
Oh, i have checked the code. The code use 128 (64 labeled data, 64 unlabeled data) at each iteration. It is quite reasonable speed because FixMatch (5-6 mins) use 7.5 x more samples than MixMatch (1 min) per each iteration.
Thanks for your quick reply! I see the uratio
parameter in your code represents $\mu$ in the paper. And in section B.5 Ratio of Labeled to Unlabeled Data in Minibatch, We can find setting $\mu$ to 8 is enough to achieve a small error rate?
@hkunzhe In Figure 3 (a) of the original paper, you can find the ablation study about the unlabeled data ratio. The authors describe that $\mu$=8 shows the smallest error with or without learning rate scaling.
@LeeDoYup Thanks for your patient reply! According to the ablation study about the unlabeled data ratio with different learning rate scaling strategies, I experimented with $\mu=1$ (64 labeled updates, 64 unlabeled updates). Since $\eta=0.03$ when $\mu=7$ (64 labeled updates and 64x7 unlabeled updates), I set $\eta=0.01$ (should be 0.075 exactly. Is it correct?) when $\mu=1$. I achieve a better result than MixMatch with a similar setting. The training time and GPU memory are also similar ($\mu=1$).
Thanks for the experiments. I want to check that the training time and GPU memory are similar to MixMatch setting or FixMatch ($\mu=7$).
Moreover, if you share the training setting (which dataset, the number of labeled data), the information would help other people see the ablation experiment !
Thanks for providing the well-documented code! It seems that every 1000 iterations taking about 5-6 mins (a single NVIDIA 2080Ti GPU). As for MixMatch, I used code here, and every 1000 iterations only take 1 min.
In fact, MixMatch also uses consistency regularization and the training iterations are the same as FixMatch. What do you think caused the slow training of FixMatch compared to MixMatch?