Why use 5 times more unlabeled data?

YyzHarry / imbalanced-semi-self

[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning

https://arxiv.org/abs/2006.07529

MIT License

735 stars 115 forks source link

Why use 5 times more unlabeled data? #9

Closed HeewonChung92 closed 9 months ago

HeewonChung92 commented 3 years ago

I read the paper. Question about Appendices E3: Effect of Unlabeled Data amount.

The results of CE+Du are 21.75, 20.35, 18.36, and 16.88 about {0.5x, 1x, 5x, 10x}. The result of 10x is better than 5x more unlabeled data. But in this paper selected 5 times.

Is there a reason?

YyzHarry commented 3 years ago

Hi, thank you for your interest. There is no specific reason --- In the main paper, the result mainly focuses on the effect of imbalanceness of labeled/unlabeled data, and the amount of unlabeled data is a control variable (thus fixed), so we just pick one random value as 5x.