abhaydoke09 / Bilinear-CNN-TensorFlow

This is an implementation of Bilinear CNN for fine grained visual recognition using TensorFlow.
191 stars 72 forks source link

Questions about two-step training precedure #17

Closed JingyunLiang closed 6 years ago

JingyunLiang commented 6 years ago

In the first stage, the paper first extracts features and trains them using logistic regression. I also think of freezing previous layers and only train the last layer. It should output comparable results.

I use PyTorch to implement it but I failed. The first stage converges and output a 45% accuracy on birds. However, the second stage won't converge and output a 0.5% accuracy all the time.

Is there any trick during training?

JingyunLiang commented 6 years ago

Problem solved. I came across numerical instability in sqrt layer. We should use sqrt(x+1e-12) instead of sqrt(x).

WeitaoVan commented 6 years ago

@MichaelLiang12 Thanks. I also managed to obtain ~84% val accuracy as the original paper did. And I found that it's important to normalize the pixels to range [0, 1] before subtracting the mean and dividing them by std, considering there are no BN layers in VGG architecture.