[Update:] I've further simplified the code to pytorch 1.5, torchvision 0.6, and replace the customized ops roipool and nms with the one from torchvision. if you want the old version code, please checkout branch v1.0
This project is a Simplified Faster R-CNN implementation based on chainercv and other projects . I hope it can serve as an start code for those who want to know the detail of Faster R-CNN. It aims to:
And it has the following features:
VGG16 train on trainval
and test on test
split.
Note: the training shows great randomness, you may need a bit of luck and more epoches of training to reach the highest mAP. However, it should be easy to surpass the lower bound.
Implementation | mAP |
---|---|
origin paper | 0.699 |
train with caffe pretrained model | 0.700-0.712 |
train with torchvision pretrained model | 0.685-0.701 |
model converted from chainercv (reported 0.706) | 0.7053 |
Implementation | GPU | Inference | Trainining |
---|---|---|---|
origin paper | K40 | 5 fps | NA |
This[1] | TITAN Xp | 14-15 fps | 6 fps |
pytorch-faster-rcnn | TITAN Xp | 15-17fps | 6fps |
[1]: make sure you install cupy correctly and only one program run on the GPU. The training speed is sensitive to your gpu status. see troubleshooting for more info. Morever it's slow in the start of the program -- it need time to warm up.
It could be faster by removing visualization, logging, averaging loss etc.
Here is an example of create environ from scratch with anaconda
# create conda env
conda create --name simp python=3.7
conda activate simp
# install pytorch
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
# install other dependancy
pip install visdom scikit-image tqdm fire ipdb pprint matplotlib torchnet
# start visdom
nohup python -m visdom.server &
If you don't use anaconda, then:
install PyTorch with GPU (code are GPU-only), refer to official website
install other dependencies: pip install visdom scikit-image tqdm fire ipdb pprint matplotlib torchnet
start visdom for visualization
nohup python -m visdom.server &
Download pretrained model from Google Drive or Baidu Netdisk( passwd: scxn)
See demo.ipynb for more detail.
Download the training, validation, test data and VOCdevkit
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
Extract all of these tars into one directory named VOCdevkit
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure
$VOCdevkit/ # development kit
$VOCdevkit/VOCcode/ # VOC utility code
$VOCdevkit/VOC2007 # image sets, annotations, etc.
# ... and several other directories ...
modify voc_data_dir
cfg item in utils/config.py
, or pass it to program using argument like --voc-data-dir=/path/to/VOCdevkit/VOC2007/
.
If you want to use caffe-pretrain model as initial weight, you can run below to get vgg16 weights converted from caffe, which is the same as the origin paper use.
python misc/convert_caffe_pretrain.py
This scripts would download pretrained model and converted it to the format compatible with torchvision. If you are in China and can not download the pretrain model, you may refer to this issue
Then you could specify where caffe-pretraind model vgg16_caffe.pth
stored in utils/config.py
by setting caffe_pretrain_path
. The default path is ok.
If you want to use pretrained model from torchvision, you may skip this step.
NOTE, caffe pretrained model has shown slight better performance.
NOTE: caffe model require images in BGR 0-255, while torchvision model requires images in RGB and 0-1. See data/dataset.py
for more detail.
python train.py train --env='fasterrcnn' --plot-every=100
you may refer to utils/config.py
for more argument.
Some Key arguments:
--caffe-pretrain=False
: use pretrain model from caffe or torchvision (Default: torchvison)--plot-every=n
: visualize prediction, loss etc every n
batches.--env
: visdom env for visualization--voc_data_dir
: where the VOC data stored--use-drop
: use dropout in RoI head, default False--use-Adam
: use Adam instead of SGD, default SGD. (You need set a very low lr
for Adam)--load-path
: pretrained model path, default None
, if it's specified, it would be loaded.you may open browser, visit http://<ip>:8097
and see the visualization of training procedure as below:
dataloader: received 0 items of ancdata
see discussion, It's alreadly fixed in train.py. So I think you are free from this problem.
Windows support
I don't have windows machine with GPU to debug and test it. It's welcome if anyone could make a pull request and test it.
This work builds on many excellent works, which include:
Licensed under MIT, see the LICENSE for more detail.
Contribution Welcome.
If you encounter any problem, feel free to open an issue, but too busy lately.
Correct me if anything is wrong or unclear.
model structure