akamaster / pytorch_resnet_cifar10

Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.
BSD 2-Clause "Simplified" License
1.22k stars 335 forks source link

Epochs chosen different than the paper #6

Closed PabloRR100 closed 5 years ago

PabloRR100 commented 5 years ago

Hi @akamaster,

The train set has 45.000 images. Taking into account that the BS = 128, that would yield to 352 iterations / epochs. In the paper they train the network for 64000 iterations, which results in 181 epochs of training.

Please, let me know if you agree

akamaster commented 5 years ago

Yes, I agree, partially. In this code, there is no TRAIN/VAL split, therefore train set is of 50k images => 390 iterations/epoch with batch-size 128, therefore total iterations required to match paper should be 165 epochs, with milestones at 81, 123, 164. The pretrained networks in reop were generated with total number of 200 epochs with milestones at 100, 150 and 200.

PabloRR100 commented 5 years ago

Thanks for the reply.

Few questions:

Thanks!

dwromero commented 5 years ago

Hey, I have the same commentary as Pablo. So, are you using the test set as validation set? I was just looking at the paper and they state the following:

"We start with a learning rate of 0.1, divide it by 10 at 32k and 48k iterations, and terminate training at 64k iterations, which is determined on a 45k/5k train/val split" --> I think this part is indeed lacking on your implementation. I would be happy to add that if you like :)

Cheers, David

kirk86 commented 5 years ago

Most repos I've seen with pretrained models they all overfit the test set. That's another reason why the numbers look so good. At the minimum there's should be a split of train/val/test.

PabloRR100 commented 5 years ago

Hi @kirk86

Saying overfit on the test set does not make sense right? Since the model is not "seeing" (or trying to fit) the test data, it can not overfit it.

Cheers, Pablo

kirk86 commented 5 years ago

Hi @PabloRR100, true the model is not trying to fit the test data directly but think why we use the validation set in the first place? IMHO the validation set is to control the bias/variance tradeoff and based on that you modify your model. Now how exactly you're not overfitting the test set if you use that to modify your model based on the bias/variance tradeoff? Again IMHO the test set should be untouched at all times but exposed only once in the end after the model has been trained to evaluate its generalization capabilities.

akamaster commented 5 years ago

Dear @PabloRR100 and @kirk86, you are both right. However, in current deep learning, even if you do use validation to control bias/variance tradeoff, since everyone publishes better results, it implicitly means optimizing (looking into) over test data. Clearly, if model doesn't improve over test, then no-one would publish it, therefore, whenever something 'better' appears, it is necessarily over fitting the test data.