clovaai / CutMix-PyTorch

Official Pytorch implementation of CutMix regularizer
MIT License
1.22k stars 159 forks source link

Training loss/accuracy fluctuates too much when using CutMix regularization #18

Closed Kaushal28 closed 4 years ago

Kaushal28 commented 4 years ago

I'm trying to implement CutMix on CIFAR-10 dataset. Here is my implementation from the given pseudocode:

cutmix_decision = np.random.rand()
if cutmix_decision > 0.60:
    # Cutmix: https://arxiv.org/pdf/1905.04899.pdf
    x_train_shuffled, y_train_shuffled = shuffle_minibatch(x_train, y_train)
    lam = np.random.beta(CUTMIX_ALPHA, CUTMIX_ALPHA)
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    x_train[:, :, bbx1:bbx2, bby1:bby2] = x_train_shuffled[:, :, bbx1:bbx2, bby1:bby2]
    lam = 1 - (bbx2 - bbx1) * (bby2 - bby1) / (W * H)
    y_train = lam * y_train + (1 - lam) * y_train_shuffled

And here is the shuffle_minibatch() function:

def shuffle_minibatch(x, y):
    assert x.size(0)== y.size(0)
    indices = torch.randperm(x.size(0))
    return x[indices], y[indices]

I'm using PyTorch for training the model. This regularization is randomly done with probability 40% (cutmix_decision > 0.60). Now when I train the model, The training loss/accuracy fluctuates way too much. However, validation accuracy stays stable and due to stable validation accuracy, I'm assuming that the CutMix implementation is correct.

Here is the accuracy curve for both training and validation datasets.

image

Is this normal behavior while using CutMix regularization or am I missing something? Is the rate of regularization too high? or is the image resolution very low for this type of regularization? In case if you are interested to take a look at full implementation, here is my notebook: https://www.kaggle.com/kaushal2896/cifar-10-simple-cnn-with-cutmix-using-pytorch

hellbell commented 4 years ago

@Kaushal28 Thank you for your interest in CutMix. I'm concerned with your implementation, y_train = lam * y_train + (1 - lam) * y_train_shuffled Since y_train's element is not one-hot formed vectors but target indices, so this weighted sum would lead wrong indices. Our implementation is like this, loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam) (https://github.com/clovaai/CutMix-PyTorch/blob/master/train.py#L240) Not blend label directly, but blend the losses.

Hope this helps you :)

Kaushal28 commented 4 years ago

@hellbell, Thanks for your prompt response. I tried the mentioned changes and following is the accuracy curve:

image

So still there are considerable amount of fluctuations in training accuracy. Is that normal behaviour? However, this achieves higher validation accuracy (~76-77%). I understand that the final goal is to increase validation accuracy and fluctuations in training accuracy/loss doesn't matter that much as long as validation accuracy is higher and stable. I just want to understand the behaviour.

hellbell commented 4 years ago

@Kaushal28 How to get training accuracy? Obtaining training accuracy from cutmixed sample is not straightforward. Indeed, we did not pay much attention to training accuracy.

I guess your implementation train_acc = get_accuracy(y_preds, y_train) could not correctly compute the accuracy because it only matches with y_train.

Kaushal28 commented 4 years ago

@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!

YuKaixinn commented 6 months ago

@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!完善!现在一切都清楚了!感谢您的及时回复!

May I ask how you solved this problem?