Closed Kaushal28 closed 4 years ago
@Kaushal28
Thank you for your interest in CutMix.
I'm concerned with your implementation,
y_train = lam * y_train + (1 - lam) * y_train_shuffled
Since y_train's element is not one-hot formed vectors but target indices, so this weighted sum would lead wrong indices.
Our implementation is like this,
loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam)
(https://github.com/clovaai/CutMix-PyTorch/blob/master/train.py#L240)
Not blend label directly, but blend the losses.
Hope this helps you :)
@hellbell, Thanks for your prompt response. I tried the mentioned changes and following is the accuracy curve:
So still there are considerable amount of fluctuations in training accuracy. Is that normal behaviour? However, this achieves higher validation accuracy (~76-77%). I understand that the final goal is to increase validation accuracy and fluctuations in training accuracy/loss doesn't matter that much as long as validation accuracy is higher and stable. I just want to understand the behaviour.
@Kaushal28 How to get training accuracy? Obtaining training accuracy from cutmixed sample is not straightforward. Indeed, we did not pay much attention to training accuracy.
I guess your implementation train_acc = get_accuracy(y_preds, y_train)
could not correctly compute the accuracy because it only matches with y_train.
@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!
@hellbell Perfect! Everything is clear now! Thanks for your prompt responses!完善!现在一切都清楚了!感谢您的及时回复!
May I ask how you solved this problem?
I'm trying to implement CutMix on CIFAR-10 dataset. Here is my implementation from the given pseudocode:
And here is the shuffle_minibatch() function:
I'm using PyTorch for training the model. This regularization is randomly done with probability 40% (cutmix_decision > 0.60). Now when I train the model, The training loss/accuracy fluctuates way too much. However, validation accuracy stays stable and due to stable validation accuracy, I'm assuming that the CutMix implementation is correct.
Here is the accuracy curve for both training and validation datasets.
Is this normal behavior while using CutMix regularization or am I missing something? Is the rate of regularization too high? or is the image resolution very low for this type of regularization? In case if you are interested to take a look at full implementation, here is my notebook: https://www.kaggle.com/kaushal2896/cifar-10-simple-cnn-with-cutmix-using-pytorch