Closed ShikunLi closed 2 years ago
Hi @shikunLi,
Many thanks for your interest in our work. We don't have the code in mini-WebVision cleaned. But, it should be straightforward to achieve our results by adapting the dataset from (for example) DivideMix repo, following the hyperparameters specified in the paper for this dataset (Table 1), and using the following data augmentation: transform_train = transforms.Compose([ transforms.Resize(256), transforms.RandomResizedCrop(224, scale = (0.2, 1)), transforms.RandomHorizontalFlip(p=0.5), transforms.RandomApply([transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8), transforms.RandomGrayscale(p=0.2), transforms.ToTensor(), transforms.Normalize(mean, std), ])
We'll try to get some time to clean the code, but I'm not sure if it will be soon.
Thanks for your reply. I will try to achieve the results by following your suggestions.
Hi @DiegoOrtego , I have another question about the setting of ELR results in your paper. I equip ELR with mixup like ELR+ (see the code below), but just achieves 65.60% test accuracy on CIFAR100 with 40% asymmetric label noise (71.25% reported in the paper). Could you offer the hyperparameters for the ELR in your experiments?
class elr_mixup_loss(nn.Module):
def __init__(self, num_examp, num_classes=100, elr_lambda = 7, beta=0.9):
super(elr_mix_loss, self).__init__()
self.pred_hist = (torch.zeros(num_examp, num_classes)).cuda()
self.q = 0
self.beta = beta
self.num_classes = num_classes
self.elr_lambda = elr_lambda
def forward(self, output, y_labeled):
y_pred = F.softmax(output,dim=1)
y_pred = torch.clamp(y_pred, 1e-4, 1.0-1e-4)
ce_loss = torch.mean(-torch.sum(y_labeled * F.log_softmax(output, dim=1), dim = -1))
reg = ((1-(self.q * y_pred).sum(dim=1)).log()).mean()
final_loss = ce_loss + self.elr_lambda*reg
return final_loss
def update_hist(self, epoch, out, index= None, mix_index = ..., mixup_l = 1):
y_pred_ = F.softmax(out,dim=1)
self.pred_hist[index] = self.beta * self.pred_hist[index] + (1-self.beta) * y_pred_/(y_pred_).sum(dim=1,keepdim=True)
self.q = mixup_l * self.pred_hist[index] + (1-mixup_l) * self.pred_hist[index][mix_index]
Hi @ShikunLi,
Many thanks for your interest in our work! Much appreciated! Apologies for the delay in getting back to you, but I've been quite busy in the last weeks. Checking CIFAR-100 scripts for ELR, this is what I see:
Epochs: 250 LR: 0.02 decreased at epoch 200 dividing by 10 Batch size: 128 Weight decay: 5e-4 (SGD with momentum 0.9) coef_step: 40000 elr_lambda: 7 (3 for CIFAR-10) elr_beta: 0.9 (0.7 for CIFAR-10)
Best, Diego.
Hi,thanks for your interesting work! As the paper reported, MOIT performs much better than other SOTAs on mini-WebVision dataset. Could you share the code on mini-WebVision dataset? That's will help me a lot!