LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
MIT License
543 stars 84 forks source link

Actual noise rate regarding symmetric noise #29

Closed ChanLIM closed 3 years ago

ChanLIM commented 3 years ago

In case of symmetric noise, it seems to me that some labels that were intended to be corrupted aren't actually corrupted.

Let's take 50% symmetric noise in CIFAR10(10 classes) for example. The code intends to apply noise to 25000 out of 50000 instances, but 10% of randomly labeled 25000 samples(since there are 10 classes) will be mapped back to their original labels, resulting in only 22500 noisy labeled samples. Because of this, in CIFAR10, 50% symmetric noise will actually end up in 45% noise rate. (49.5% noise rate in CIFAR100)

Adding the following lines in dataloader_cifar.py makes an exact 50% noise. https://github.com/LiJunnan1992/DivideMix/blob/d9d3058fa69a952463b896f84730378cdee6ec39/dataloader_cifar.py#L68

while True:
    noiselabel = random.randint(0, 9)
    if train_label[i]!=noiselabel: break
LiJunnan1992 commented 3 years ago

Hi, thanks for your interest in DivideMix!

You are correct that the noise ratio does not equal to the "true" noise ratio. We have specified the difference between these two noise injection methods in Section 4.1, and reported results with "true" noise ratio in Table 6.

ChanLIM commented 3 years ago

Got it. Thanks