Open hhhhnwl opened 2 years ago
这里是要加一个最小化熵作为惩罚项吧,但是按照现在的写法pred_mean越大,loss越小,还是说有其他的考虑呢
举一个极端的例子,假设3个分类数量均衡,每个batch有三个样本 batch1 softmax的结果是[[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]] batch2 softmax的结果是[[1,0,0],[0,1,0],[0,0,1]] 这两个batch的penalty几乎相等,是否和预期不符呢
hi, bro, I am learning this implementation of DivideMix. The author did transform in the warm-up training stage, I wonder the reason why they did it. Do you know it? Maybe for best performance? Thanks.
Hi, @YuanShunJie1 , I am a freshman who is studying noisy label learning, warm-up is used in the early training stage because the network will overfit the clean samples in the early training stage(these samples have small loss values), so using warm-up in the early stage could do the Co-divide operation to distinguish the clean label or noisy label. That is my understanding.
prior = torch.ones(args.num_class)/args.num_class
prior = prior.cuda()
pred_mean = torch.softmax(logits, dim=1).mean(0)
penalty = torch.sum(prior*torch.log(prior/pred_mean))
entropy=plog(p) why not penalty = `torch.sum(pred_meantorch.log(prior/pred_mean))`