Training loss/accuracy fluctuates too much when using CutMix regularization

Kaushal28 commented 4 years ago

I'm trying to implement CutMix on CIFAR-10 dataset. Here is my implementation from the given pseudocode:

cutmix_decision = np.random.rand()
if cutmix_decision > 0.60:
    # Cutmix: https://arxiv.org/pdf/1905.04899.pdf
    x_train_shuffled, y_train_shuffled = shuffle_minibatch(x_train, y_train)
    lam = np.random.beta(CUTMIX_ALPHA, CUTMIX_ALPHA)
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    x_train[:, :, bbx1:bbx2, bby1:bby2] = x_train_shuffled[:, :, bbx1:bbx2, bby1:bby2]
    lam = 1 - (bbx2 - bbx1) * (bby2 - bby1) / (W * H)
    y_train = lam * y_train + (1 - lam) * y_train_shuffled

And here is the shuffle_minibatch() function:

def shuffle_minibatch(x, y):
    assert x.size(0)== y.size(0)
    indices = torch.randperm(x.size(0))
    return x[indices], y[indices]

I'm using PyTorch for training the model. This regularization is randomly done with probability 40% (cutmix_decision > 0.60). Now when I train the model, The training loss/accuracy fluctuates way too much. However, validation accuracy stays stable and due to stable validation accuracy, I'm assuming that the CutMix implementation is correct.

Here is the accuracy curve for both training and validation datasets.

Is this normal behavior while using CutMix regularization or am I missing something? Is the rate of regularization too high? or is the image resolution very low for this type of regularization? In case if you are interested to take a look at full implementation, here is my notebook: https://www.kaggle.com/kaushal2896/cifar-10-simple-cnn-with-cutmix-using-pytorch

hellbell commented 4 years ago

@Kaushal28 Thank you for your interest in CutMix. I'm concerned with your implementation, y_train = lam * y_train + (1 - lam) * y_train_shuffled Since y_train's element is not one-hot formed vectors but target indices, so this weighted sum would lead wrong indices. Our implementation is like this, loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam) (https://github.com/clovaai/CutMix-PyTorch/blob/master/train.py#L240) Not blend label directly, but blend the losses.

Hope this helps you :)

Kaushal28 commented 4 years ago

@hellbell, Thanks for your prompt response. I tried the mentioned changes and following is the accuracy curve:

So still there are considerable amount of fluctuations in training accuracy. Is that normal behaviour? However, this achieves higher validation accuracy (~76-77%). I understand that the final goal is to increase validation accuracy and fluctuations in training accuracy/loss doesn't matter that much as long as validation accuracy is higher and stable. I just want to understand the behaviour.

hellbell commented 4 years ago

@Kaushal28 How to get training accuracy? Obtaining training accuracy from cutmixed sample is not straightforward. Indeed, we did not pay much attention to training accuracy.

I guess your implementation train_acc = get_accuracy(y_preds, y_train) could not correctly compute the accuracy because it only matches with y_train.

Kaushal28 commented 4 years ago

@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!

YuKaixinn commented 6 months ago

@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!完善！现在一切都清楚了！感谢您的及时回复！

May I ask how you solved this problem?

clovaai / CutMix-PyTorch

Training loss/accuracy fluctuates too much when using CutMix regularization #18