Got lower AP with the VGG16 reference model

yiyun18 commented 8 years ago

Hi all, When I run the code on PASCAL VOC 2012 Action dataset with the trained models, the output is similar to that in the paper. But, the results, which are generated with the method of training with the reference model, are lower than that in the paper. For example, the AP of phoning is 0.138 which is much lower than that in the paper. The scripts for training are as follows.

./tools/train_net.py --gpu 0 --solver models/VGG16_RstarCNN/solver.prototxt --weights reference_models/VGG16.v2.caffemodel
./tools/test_net.py --gpu 0 --def models/VGG16_RstarCNN/test.prototxt --net output/default/voc_2012_train/vgg16_fast_rcnn_joint_train_iter_40000.caffemodel

gkioxari commented 8 years ago

Hi! I don't quite understand what you are saying. So you get the right AP with the trained models. But where I am lost is when you say that the results with the reference model are lower. Do you mean that you train a baseline model (e.g. VGG16 fast R-CNN) for the task and the resulting AP is lower? To help you further, could you give me the details of what kind of model you used for evaluation and/or training, which got you lower AP?

Thanks!

yiyun18 commented 8 years ago

Hi, Thanks for your help. According to the usage of R_CNN (https://github.com/gkioxari/RstarCNN), I train a VGG16 network, test this R_CNN classifier on VOC 2012, and then the resulting AP is lower than that in Table 1 of the paper. The scripts for training and testing are as follows.

./tools/train_net.py --gpu 0 --solver models/VGG16_RstarCNN/solver.prototxt --weights reference_models/VGG16.v2.caffemodel ./tools/test_net.py --gpu 0 --def models/VGG16_RstarCNN/test.prototxt --net output/default/voc_2012_train/vgg16_fast_rcnn_joint_train_iter_40000.caffemodel

The reference model of VGG16 (i.e. VGG16.v2.caffemodel) is downloaded from (http://www.cs.berkeley.edu/%7Egkioxari/RstarCNN/reference_models.tar.gz), and the model for evaluation is trained with the above script.

PS. On VOC 2012 dataset, I get the right AP of RstarCNN with the trained model which is downloaded from (http://www.cs.berkeley.edu/%7Egkioxari/RstarCNN/trained_models.tar.gz). So, I guess that there are some wrongs about the script of training model.

gkioxari commented 8 years ago

So for the baseline model you should be using the right prototxts (in RstarCNN/models/VGG16/) and the right scripts for the vanilla model. According to your commands you are using the RstarCNN model, which is the final model and not the baseline.

The reference model is trained on ImageNet and is NOT the baseline/vanilla model. It is the model that is being used to initialize the weights for all trained models.

yiyun18 commented 8 years ago

Thanks very much. Can you give a right script for training a RstarCNN model on VOC 2012 dataset?

gkioxari commented 8 years ago

In models/VGG16/train.prototxt you can see that the data layer appropriate for the baseline (vanilla) model is DataLayer the definition of which can be found in lib/data_layer/layer.py

For training: All you need to do is import the correct roidb in lib/fast_rcnn/train.py which gets called by the main tools/train_net.py. There you should substitue line 17 with import data_layer.roidb as rdl_roidb.

For testing: In tools/test_net.py substitute line 13 with from fast_rcnn.test_vanilla import test_net

I understand how the setup is a bit confusing but I decided to go that way to avoid having a huge number of files for each baseline model. Instead, you have one source function and you have to import the right roidb to make it work. This holds not only for the baseline model but also for the random model, the scene model and the attributes.

I hope that helped.

yiyun18 commented 8 years ago

Thanks for your help. I have downloaded the selective search regions from (http://www.cs.berkeley.edu/%7Egkioxari/RstarCNN/ss_voc2012.tar.gz).

My aim is to obtain the results of 'R∗CNN (0.2, 0.75)' in Table 1. Can you tell me the scripts for training the model?

yiyun18 commented 8 years ago

The original line 17 of lib/fast_rcnn/train.py is import roi_data_layer.roidb as rdl_roidb. It seems that lib\roi_data_layer\roidb.py and lib\data_layer\roidb.py are the same.

The results, which are obtained by the following scripts, are much lower (e.g., the AP of phoning is 0.138). Can you give a detailed instruction about the training and testing for the R∗CNN?

./tools/train_net.py --gpu 0 --solver models/VGG16_RstarCNN/solver.prototxt --weights reference_models/VGG16.v2.caffemodel ./tools/test_net.py --gpu 0 --def models/VGG16_RstarCNN/test.prototxt --net output/default/voc_2012_train/vgg16_fast_rcnn_joint_train_iter_40000.caffemodel

gkioxari / RstarCNN

Got lower AP with the VGG16 reference model #28