Paper's VGG-19 accuracy question

simo23 commented 7 years ago

Hi, first of all thanks for your great work!

In your paper you cite the VGG-19 [27] model and state that on the CUB-200-2011 dataset it achieves 77.8% accuracy. Can you please give some more info about this? Are you referring to the only Imagenet trained model? Or on the fine-tuned by you model? Or fine-tuned by someone else model? Is it the Caffe model?

And if you did train it can you share some of the details like batch size, learning rate, epochs of the training, data augmentation?

Thanks, Andrea

simo23 commented 7 years ago

Hi, thanks for your answer.

I'm already performing isotropic scaling on the images and random crop unfortunately.

If you don't mind I have some questions:

Which framework do you use?
Do you train with 448x448 images from scratch or from a vgg19 already trained on CUB with smaller input?
If you train it with 448x448 from the beginning, how do you compute the loss? Which stride do you apply to the layers after the conv5?
Do you take the output of the network, which in case of a stride 1 with 448x448 image will be a 8x8x200, make the average to get a 1x200, apply sofmax and cross entropy or something else?
When you compute the validation accuracy do you still use random crop, central crop or something else?

Thanks

On Sat, Sep 30, 2017 at 3:20 AM, bhchen notifications@github.com wrote:

@simo23 https://github.com/simo23 Hi. maybe i can answer your question, the important thing is data preprocessing. I suggest you normalize the shortest edge of the original image to 512 and keep the original aspect ratio. Then use a random crop one of size 448*448 during training. I use the original VGG19-model and achieve 78.3% acc on CUB. Good luck to you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jianlong-Fu/Recurrent-Attention-CNN/issues/8#issuecomment-333272028, or mute the thread https://github.com/notifications/unsubscribe-auth/ASvIdGh209c7YjQEprj9R7sgtTFziGLLks5snZdagaJpZM4PeLul .

chenbinghui1 commented 7 years ago

@simo23 Hi To your questions：

I use VGG19 pretrainted on imagenet
I fine-tune the VGG19 model on CUB with 448x448
When fine-tune with 448 I change the last pooling layer to stride 4 and kernel size 4.
If you don't use FC6 and FC7 ,i.e. pool5(global_AVE)+softmax(200), you may get 74+% acc. Otherwise, you can get 78% acc. 5 When test on VAL set, just use central crop without mirror. Since it is default in caffe.

simo23 commented 7 years ago

Ok, thank you very much for the answer. I get the 78% now.

youhebuke commented 7 years ago

@simo23 @chenbinghui1 Hi,I got about 74.5% acc using pool5(global_ave,kernel size 28, stride 28)+FC(512x200)+softmax, just as chenbinghui1 said. But I can't get 78% acc using pool5(kernel size 4, stride 4)+FC6+FC7+FC8new(4096x200)+softmax.I just got about 65% acc. I wonder where the problem is. Could you help me? Really need your help, thank you.

simo23 commented 7 years ago

Hi, @youhebuke! The relevant details of my training are:

Last pooling layer modified to stride=4, kernel size =4 but still MAX pooling, not AVG
New layer initialized with biases=0 and weights= random gaussian with std dev = 0.01
Random 448 crop with random flip at training time
Central 448 crop at test time
Train the new FC layer with learning rate 1e-3 and all the other layers with learning rate 1e-4
Batch size = 32
L2 regularization on all weights, not biases, with decay=5e-4 as standard VGG
Preprocess both train and test images by subtracting the RGB means values of VGG. Be careful that you subtract the right value to the right channel. You must check the function that imports the images from file and be sure if the imported image is in RGB or GBR.

Let me know if this helps!

super-wcg commented 7 years ago

@simo23 Hi,did you train the RA-CNN?How did you define the loss?

simo23 commented 7 years ago

Hi @super-wcg, I did not train the RA-CNN sorry.

chenfeima commented 6 years ago

@simo23 Hello, I only get 75+ accuracy. Can you share me your train.prototxt?

chenfeima commented 6 years ago

@youhebuke Do you solve the problem? I also using pool5(kernel size 4, stride 4)+FC6+FC7+FC8new(4096x200)+softmax, but only get 75+ accuracy

simo23 commented 6 years ago

Hi @chenfeima, I do not use Caffe so I cannot share the prototxt, but the details are already written in an earlier anwer. Maybe you need to wait for a little longer?

chenfeima commented 6 years ago

@simo23 Tank you! Have you achieved the RA-CNN？What about your lank loss and the train strategy?

simo23 commented 6 years ago

Hi @chenfeima, I did not try to reproduce the RA-CNN sorry. By the way, there is now a more interesting work by the same team Multiattention

chenfeima commented 6 years ago

@simo23 That is more difficult. I want to achieve RA-CNN first. Do you have the rank loss?

simo23 commented 6 years ago

@chenfeima No, I did not implement it.

RTMDFG commented 6 years ago

@chenfeima Do you implement the RA-CNN?

whyou5945 commented 6 years ago

@simo23 @chenbinghui1 @youhebuke thanks for your discussion, revealing the details of training VGG-19 on cub_bird dataset. you mentioned "Random 448 crop" in the training process, you mean resizing the shorter side to 448 then crop 448 randomly?

caoquanjie commented 6 years ago

@simo23 Hi, could you help me ? I just got only about 65% acc using pool5(kernel size4, stride 4)+FC6+FC7+FC8(4096x200)+softmax. I just followed your training process as you said above, and I achieved it with tensorflow, I don't know where the problem is , and I really need your help, thank you.

simo23 commented 5 years ago

Hi @caoquanjie,

there could be a million issues related to your training, so I am not sure what is going on. One of the things that maybe was missing that surely has a huge impact is the initialization. Do you start the training from scratch or from a pre-trained model on Imagenet?

caoquanjie commented 5 years ago

@simo23 thank you for your reply, I just solved this problem yesterday. I start the training process from a pre-trained model on Imagenet. First, I finetune the model using only fc8 with learning_rate of 1e-3 for 5000steps and then train all variables(including convolution variables) with learning_rate of 1e-3 for 10000steps.Finally, use the learning rate of 1e-4 to train 10,000 steps in the same way as before. Maybe the choice of optimizer is a problem, I chose SGD later and then I got 77.4% accuracy. Anyway, thank you for your reply.

MubarkLa commented 5 years ago

Hi, @youhebuke! The relevant details of my training are:

Last pooling layer modified to stride=4, kernel size =4 but still MAX pooling, not AVG

New layer initialized with biases=0 and weights= random gaussian with std dev = 0.01

Random 448 crop with random flip at training time

Central 448 crop at test time

Train the new FC layer with learning rate 1e-3 and all the other layers with learning rate 1e-4

Batch size = 32

L2 regularization on all weights, not biases, with decay=5e-4 as standard VGG

Preprocess both train and test images by subtracting the RGB means values of VGG. Be careful that you subtract the right value to the right channel. You must check the function that imports the images from file and be sure if the imported image is in RGB or GBR.

Let me know if this helps!

Hi, @simo23

May I ask whether you used any dropout layer in the vgg19 when finetuning on the bird dataset? Thank you.

hamedbehzadi commented 3 years ago

@simo23 thank you for your reply, I just solved this problem yesterday. I start the training process from a pre-trained model on Imagenet. First, I finetune the model using only fc8 with learning_rate of 1e-3 for 5000steps and then train all variables(including convolution variables) with learning_rate of 1e-3 for 10000steps.Finally, use the learning rate of 1e-4 to train 10,000 steps in the same way as before. Maybe the choice of optimizer is a problem, I chose SGD later and then I got 77.4% accuracy. Anyway, thank you for your reply.

Hi @caoquanjie Can I ask you for some details. Did you reach such accuracy by using only different learning rates in multibple training procedure?Have you ever modified the architecture such as changing the pooling layer as discussed by others?

Thank you in advance for your attention.

Jianlong-Fu / Recurrent-Attention-CNN

Paper's VGG-19 accuracy question #8