Open simo23 opened 7 years ago
Hi, thanks for your answer.
I'm already performing isotropic scaling on the images and random crop unfortunately.
If you don't mind I have some questions:
Thanks
On Sat, Sep 30, 2017 at 3:20 AM, bhchen notifications@github.com wrote:
@simo23 https://github.com/simo23 Hi. maybe i can answer your question, the important thing is data preprocessing. I suggest you normalize the shortest edge of the original image to 512 and keep the original aspect ratio. Then use a random crop one of size 448*448 during training. I use the original VGG19-model and achieve 78.3% acc on CUB. Good luck to you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jianlong-Fu/Recurrent-Attention-CNN/issues/8#issuecomment-333272028, or mute the thread https://github.com/notifications/unsubscribe-auth/ASvIdGh209c7YjQEprj9R7sgtTFziGLLks5snZdagaJpZM4PeLul .
@simo23 Hi To your questions:
Ok, thank you very much for the answer. I get the 78% now.
@simo23 @chenbinghui1 Hi,I got about 74.5% acc using pool5(global_ave,kernel size 28, stride 28)+FC(512x200)+softmax, just as chenbinghui1 said. But I can't get 78% acc using pool5(kernel size 4, stride 4)+FC6+FC7+FC8new(4096x200)+softmax.I just got about 65% acc. I wonder where the problem is. Could you help me? Really need your help, thank you.
Hi, @youhebuke! The relevant details of my training are:
Let me know if this helps!
@simo23 Hi,did you train the RA-CNN?How did you define the loss?
Hi @super-wcg, I did not train the RA-CNN sorry.
@simo23 Hello, I only get 75+ accuracy. Can you share me your train.prototxt?
@youhebuke Do you solve the problem? I also using pool5(kernel size 4, stride 4)+FC6+FC7+FC8new(4096x200)+softmax, but only get 75+ accuracy
Hi @chenfeima, I do not use Caffe so I cannot share the prototxt, but the details are already written in an earlier anwer. Maybe you need to wait for a little longer?
@simo23 Tank you! Have you achieved the RA-CNN?What about your lank loss and the train strategy?
Hi @chenfeima, I did not try to reproduce the RA-CNN sorry. By the way, there is now a more interesting work by the same team Multiattention
@simo23 That is more difficult. I want to achieve RA-CNN first. Do you have the rank loss?
@chenfeima No, I did not implement it.
@chenfeima Do you implement the RA-CNN?
@simo23 @chenbinghui1 @youhebuke thanks for your discussion, revealing the details of training VGG-19 on cub_bird dataset. you mentioned "Random 448 crop" in the training process, you mean resizing the shorter side to 448 then crop 448 randomly?
@simo23 Hi, could you help me ? I just got only about 65% acc using pool5(kernel size4, stride 4)+FC6+FC7+FC8(4096x200)+softmax. I just followed your training process as you said above, and I achieved it with tensorflow, I don't know where the problem is , and I really need your help, thank you.
Hi @caoquanjie,
there could be a million issues related to your training, so I am not sure what is going on. One of the things that maybe was missing that surely has a huge impact is the initialization. Do you start the training from scratch or from a pre-trained model on Imagenet?
@simo23 thank you for your reply, I just solved this problem yesterday. I start the training process from a pre-trained model on Imagenet. First, I finetune the model using only fc8 with learning_rate of 1e-3 for 5000steps and then train all variables(including convolution variables) with learning_rate of 1e-3 for 10000steps.Finally, use the learning rate of 1e-4 to train 10,000 steps in the same way as before. Maybe the choice of optimizer is a problem, I chose SGD later and then I got 77.4% accuracy. Anyway, thank you for your reply.
Hi, @youhebuke! The relevant details of my training are:
- Last pooling layer modified to stride=4, kernel size =4 but still MAX pooling, not AVG
- New layer initialized with biases=0 and weights= random gaussian with std dev = 0.01
- Random 448 crop with random flip at training time
- Central 448 crop at test time
- Train the new FC layer with learning rate 1e-3 and all the other layers with learning rate 1e-4
- Batch size = 32
- L2 regularization on all weights, not biases, with decay=5e-4 as standard VGG
- Preprocess both train and test images by subtracting the RGB means values of VGG. Be careful that you subtract the right value to the right channel. You must check the function that imports the images from file and be sure if the imported image is in RGB or GBR.
Let me know if this helps!
Hi, @simo23
May I ask whether you used any dropout layer in the vgg19 when finetuning on the bird dataset? Thank you.
@simo23 thank you for your reply, I just solved this problem yesterday. I start the training process from a pre-trained model on Imagenet. First, I finetune the model using only fc8 with learning_rate of 1e-3 for 5000steps and then train all variables(including convolution variables) with learning_rate of 1e-3 for 10000steps.Finally, use the learning rate of 1e-4 to train 10,000 steps in the same way as before. Maybe the choice of optimizer is a problem, I chose SGD later and then I got 77.4% accuracy. Anyway, thank you for your reply.
Hi @caoquanjie Can I ask you for some details. Did you reach such accuracy by using only different learning rates in multibple training procedure?Have you ever modified the architecture such as changing the pooling layer as discussed by others?
Thank you in advance for your attention.
Hi, first of all thanks for your great work!
In your paper you cite the VGG-19 [27] model and state that on the CUB-200-2011 dataset it achieves 77.8% accuracy. Can you please give some more info about this? Are you referring to the only Imagenet trained model? Or on the fine-tuned by you model? Or fine-tuned by someone else model? Is it the Caffe model?
And if you did train it can you share some of the details like batch size, learning rate, epochs of the training, data augmentation?
Thanks, Andrea