LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
MIT License
529 stars 83 forks source link

unlabeled set empty during training #18

Closed stellaywu closed 4 years ago

stellaywu commented 4 years ago

Thanks for the nice implementation!

I am trying to run it on my own dataset with a slightly imbalance binary data (about 3: 1) . During the training after a few steps , the unlabeled set will be too small 0 or 1, and it will error out. Should i increase the p_threashold to encourage more samples in the unlabelled set or do some other tuning ? Does it work on imbalanced data? Not sure if it's learning because the unlabeled losses are so small the whole time. This is how the loss goes in the first few steps

Labeled loss: 0.63  Unlabeled loss: 0.04
Labeled loss: 0.65  Unlabeled loss: 0.03
Labeled loss: 0.60  Unlabeled loss: 0.03
Labeled loss: 0.63  Unlabeled loss: 0.01
Labeled loss: 0.65  Unlabeled loss: 0.02
Labeled loss: 0.30  Unlabeled loss: 0.00
Labeled loss: 0.22  Unlabeled loss: 0.02
Labeled loss: 0.63  Unlabeled loss: 0.03

Thanks so much!

LiJunnan1992 commented 4 years ago

Hi, thanks for trying our method!

Yes increasing p_threashold will put more samples in the unlabeled set. You may also try to adjust the unsupervised loss weight.

You can try some re-sampling approach for dealing with imbalanced data.

stellaywu commented 4 years ago

thanks! just a follow up questions , we have this

def eval_train(model,all_loss):    
    model.eval()
    losses = torch.zeros(50000)    

is it true that 50000 need to be adjusted to equal the number of total training samples?

Thanks!

LiJunnan1992 commented 4 years ago

Yes!