Open ducha-aiki opened 8 years ago
Just curious, do you have any idea why accuracy and loss curves have these steps? It seems to me that only the change in learning rate causes a huge leap in accuracy and there is hardly any learning between points where learning rate is stable.
@pliskowski Yes, steps are because of learning rate change. As for your second guess, it is not correct. First, learning is going even if it is slow. If you look at training logs, loss decreases. Second, there is a huge temptation to do steps, day not at each 100K, but at 100K, 120K and 140K. However, it will hurt your performance a lot - for several accuracy percent. It could be not a big deal for some practice tasks, but for ImageNet and, say, Kaggle, it makes all the difference in the world :)
@ducha-aiki I can see that there is learning but it is very slow compared to those leaps. Is there any explanation why decreasing learning rate causes such a significant boost in performance? I just cannot imagine how the classifier improves its performance over a few iterations just because of reducing the learning rate.
@pliskowski Imagine, you need to get a=1.2345. You start with a=0 and is allowed to add or subtract 1*learning_rate. If you begin to decrease learning rate too early, you never got even a=1. After you have got a = 1 with learning_rate=1, it is extremely hard to get a=1.23, until you decrease learning rate.
Added ELU, on-going PReLU and RReLU
Added BN+ReLUx(Dropout=0; 0.2; 0.5), 0.2 rules.
On-going:
Finished PReLU, RReLU, maxout test. Maxout results shows, that you need to wait until finish training to get results, not judge by first 100K iters
In progress:
Added:
In progress:
Added:
In progress:
Added nice tables with results
Added:
ThinResNet. 100-layer deep resudual net with CaffeNet speed. Maxout + BN
In progress:
Added stochastic pooling
Added one more attempt to train MSRA ResNet.
Added:
Do you plan to implement Net2Net to extend some of this archs without too much retraining?
@bhack yes, but not for testing - with Net2Net it would be infair comparison, I guess. For me, the outcome of comparison are not only final accuracies, but also training graphs, which can give you some insights.
And when cudnn3 will be supported by TensorFlow, I am afraid that migrate there.
@ducha-aiki Yes I meant for fast evaluation on improved accuracy. Then for formal testing you could always retrain the new experimented extended arch with scratch initialization.
You choice it is understandable.
@ducha-aiki "And when..." https://github.com/tensorflow/tensorflow/commit/22ebf0a94fd42af2d78b7964e836c92673ddfa31
@bhack thanks! Will try next week. btw, tomorrow new results in colorspaces for caffenet :)
Added colorspace, poolings, and googlenet-128 for baseline
@bhack looks like still no CUDA 7.5 Support :(
Hi,
I have started an batchnorm/activations/architectures evaluation on ImageNet 2012 with image side = 128. The reason for 128 is that training is much faster (48 hours on GTX980) than default setup and not change overall picture.
The reason for ImageNet is that CIFAR10/MNIST experiments are not representable for big datasets. I.e. vlReLU better than ReLU on CIFAR10, but worse at ImageNet. BatchNorm hurts on CIFAR10, but helps on ImageNet, etc.
BatchNorm evaluation: https://github.com/ducha-aiki/caffenet-benchmark Activations evaluation: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Activations.md Architectures evaluation: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Architectures.md
I am not sure, if it is relevant as Issue or Wiki, but think that community could benefit from it.
P.S. Requests "what to evaluate next" are welcome. PRs with your tests are welcomed even more :) P.P.S. Current on-going training: