LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
MIT License
529 stars 83 forks source link

Cannot get the correct image of GMM's result in the setting cifar10_asym_0.4 #33

Closed wangkiw closed 3 years ago

wangkiw commented 3 years ago

Hi, thanks for your idea and code! I want to check the loss distribution in the DivideMix pipeline, so I want to plot the distribution like the image in the paper: dividemix I plot the image of cifar10-sym-0.5/0.8, and it looks like right(e.g. cifar10-sym-0.5 Epoch13): QQ截图20210518135608 But I find it looks wrong in the setting cifar10-asym-0.4(lambda_u=0,initial learning rate=0.02,batch_size=128,warm_up=10 epochs,p_threshold=0.5)(e.g. Epoch10/Epoch14): asym-0 4 asym0 4

I don't change the DivideMix‘s implementation and use the conf_penlty(noise_mode=asym). My plot method is save the noisy index in dataloader_vifar.py:

noise_label = []
idx = list(range(50000))
random.shuffle(idx)
num_noise = int(self.r*50000)            
noise_idx = idx[:num_noise]
np.save('noiseidx_%s_%.1f.npy'%(noise_mode,r),np.array(noise_idx))

and plot the GMM's result in Train_cifar.py:

pred1 = (prob1 > args.p_threshold)      
pred2 = (prob2 > args.p_threshold)      

all_idx=list(range(50000))
noisy_idx=np.load('noiseidx_%s_%.1f.npy'%(args.noise_mode,args.r)).tolist()
clean_idx=[]
for i in all_idx:
  if i not in noisy_idx:
    clean_idx.append(i)
clean_loss=all_loss[0][-1][clean_idx].numpy()
noisy_loss=all_loss[0][-1][noisy_idx].numpy()

import matplotlib.pyplot as plt
plt.hist(clean_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="lightsteelblue",label='clean')
plt.hist(noisy_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="pink",label='noisy')

plt.title('Epoch %d'%(epoch))
plt.legend(loc='upper right')
plt.xlabel('Normalized loss')
plt.ylabel('Empirical pdf')
svgname='epoch_'+str(epoch)+'.svg'
svg_path=os.path.join(GMM_imgs_path,svgname)
plt.savefig(svg_path)
plt.cla()

Can you give me some advice?Thanks~~

LiJunnan1992 commented 3 years ago

Hi,

Can you reproduce the paper's results for the asymmetric noise?

Also, the batch size argument should be set as 64 (default).

wangkiw commented 3 years ago

@LiJunnan1992 Thanks reply! I have run 71 epochs(test accuracy:90.32%),the test accuracy and loss looks ok but the GMM's results look bad :( epoch71 Thanks for the tip,I will try batch_size=64 later~