Closed JingyunLiang closed 6 years ago
Problem solved. I came across numerical instability in sqrt layer
. We should use sqrt(x+1e-12)
instead of sqrt(x)
.
@MichaelLiang12 Thanks. I also managed to obtain ~84% val accuracy as the original paper did. And I found that it's important to normalize the pixels to range [0, 1] before subtracting the mean and dividing them by std, considering there are no BN layers in VGG architecture.
In the first stage, the paper first extracts features and trains them using logistic regression. I also think of freezing previous layers and only train the last layer. It should output comparable results.
I use PyTorch to implement it but I failed. The first stage converges and output a 45% accuracy on birds. However, the second stage won't converge and output a 0.5% accuracy all the time.
Is there any trick during training?