Training end-to-end from scratch

Feynman27 commented 6 years ago

Has anyone had any luck training this model end-to-end without external object proposals or pretraining the RFCN network on Imagenet DET? I've been trying to train the D(&T loss) model in pytorch and have only reached a frame mean AP of ~64% on the full imagnet VID validation set. Some implementation notes:

I'm only training/testing on Imagenet VID (I am not using anything from Imagenet DET).
As in the paper, I'm sampling 10 frames from each video snippet in the training set. These frames are sampled at regular intervals across the duration of the snippet.
I'm using resnet-101 with pretrained imagenet weights and am randomly initializing the RPN and RCNN.
I'm using correlation features on conv3, conv4, and conv5 and am regressing on the ground truth boxes in frame t --> t+tau.
I am using an L1 smooth loss for the tracking loss.
I am not linking detections across frames at the moment.
I am using a batch size of 2 (2 images per video, 2 videos = 4 frames total)
My initial lr is 5e-4

123liluky commented 6 years ago

Could you share your steps for traning the model and things we need pay attention to. I have downloaded ILSVRC2017 and I am going to train D&T model. Without RPN proposals provided for ILSVRC2017, what changes should I do in the source code? I found RPN proposals in video_generate_random_minibatch.m and I stopped here. Thank you very much.

Feynman27 commented 6 years ago

Sure:

Downloaded the ILSVRC2015 VID and created a script to subsample 10 frames from each video snippet in the training set. I sampled the frames from within each video snippet at evenly-spaced intervals. I wrote these to train.txt, and a sample is show below:

train/ILSVRC2015_train_00712000/000000 1 0 693
train/ILSVRC2015_train_00712000/000077 1 77 693
train/ILSVRC2015_train_00712000/000154 1 154 693
train/ILSVRC2015_train_00712000/000231 1 231 693
train/ILSVRC2015_train_00712000/000308 1 308 693
train/ILSVRC2015_train_00712000/000385 1 385 693
train/ILSVRC2015_train_00712000/000462 1 462 693
train/ILSVRC2015_train_00712000/000539 1 539 693
train/ILSVRC2015_train_00712000/000616 1 616 693
train/ILSVRC2015_train_00712000/000693 1 693 693

The last column is the total frames in the video snippet, and the column to the left of it is the sampled frame number.

Any frames that do not have a track correspondence with the next frame are removed from the training roidb.
I built the RFCN siamese network and used the nvidia flownet2 correlation layer here. Currently, my code only supports 2 frames per input sample.
I added a tracking proposal layer that regress the ground truth boxes in frame t --> t+tau.
I trained the network using an initial learning rate of 5e-4 and a batch size of 2. When the validation mAP stopped improving, I decayed the lr by a factor of 10.

I'm still trying to reach >70% val mAP.

123liluky commented 6 years ago

conf: batch_size: 2 bbox_class_agnostic: 1 bbox_thresh: 0.5000 bg_thresh_hi: 0.5000 bg_thresh_lo: 0 fg_fraction: -1 fg_thresh: 0.5000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 test_binary: 0 test_max_size: 1000 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 root_path: 'D:\work\code\Detect-Track-master\Detect-Track-master\models\data\ILSVRC\' input_modality: 'rgb' sample_vid: 1 nFramesPerVid: 2 time_stride: 1 num_proposals: 300 regressTracks: 1 regressAllTracks: 0 opts: cache_name: 'rfcn_ILSVRC_ResNet50_DetectTrack_rgbtstride_1track_reg_lowerLR_batch8' caffe_version: 'caffe-rfcn' conf: [1x1 struct] do_val: 1 imdb_train: {[1x1 struct]} imdb_val: [1x1 struct] net_file: 'D:\work\code\Detect-Track-master\Detect-Track-master\models\pre_trained_models\ResNet-50L\ResNet-50-D-ilsvrc-vid.caffemodel' offline_roidb: 1 output_dir: 'D:\work\code\Detect-Track-master\Detect-Track-master\models\data\ILSVRC\' resume_iter: 0 roidb_train: {[1x1 struct]} roidb_val: [1x1 struct] snapshot_interval: 10000 solver_def_file: 'D:\work\code\Detect-Track-master\Detect-Track-master\models\rfcn_prototxts\ResNet-50L_ILSVRCvid_corr\solver_160k240k_lr1_4.p...' val_interval: 5000 val_iters: 1500 visualize_interval: 0 Today, I used parameters like above and ILSVRC2015 train and val video data to train D&T. Because my gpu memory is only 2G. Then, error happened like below: error using parallel.FevalFuture/fetchNext (line 243). The function evaluation completed with an error. error using rfcn_train (line 141) [~, net_inputs] = fetchNext(parHandle); error using script_DetectTrack_ILSVRC_vid_ResNet_OHEM_rpn (line 77) opts.rfcn_model = rfcn_train(conf, dataset.imdb_train, dataset.roidb_train, ... reason: error using rfcn_get_minibatch>sample_rois (line 252). Input must be non-negative integer. I doubt some parameter setting above was wrong, so error happened. As you know, to reduce gpu memory usage, which parameters can I change? I am not sure about the relationship among batch_size,ims_per_batch,sample_vid and nFramesPerVid. Thanks for your help.

Feynman27 commented 6 years ago

As mentioned above, I'm not using the matlab implementation. I'm writing my own python implementation using PyTorch. I'm only using this repo as a guide.

I have 2 TitanX GPUs (12 gbs each). I think 2 GB is too small. You probably need at least 8 GB.

As for the hyperparameters, your best strategy is to read the source matlab code, but my guess:

batch_size: just the number of samples per batch
ims_per_batch: number of total images in the batch (e.g. for a duel-frame siamese net, ims_per_batch=4)
sample_vid: bool: whether to subsample frames from the video snippet
nFramesPerVid: number of frames to use per sample in a single forward pass (e.g. 2)

LiangXu123 commented 6 years ago

@Feynman27 Hello man,it seems that we two are doing the same thing now，i am still working on training DT in pytorch，and i use correlation layer comes from nvidia flownet2 too. here are some of my experiment results:

I'm  training/testing on **Imagenet VID+  Imagenet DET** just as the paper did.
As in the paper, I'm sampling 10 frames from each video snippet in the training set. These frames are sampled at regular intervals across the duration of the snippet.
I'm using resnet-101 with pretrained imagenet weights and am randomly initializing the RPN and RCNN.
I'm using correlation features on conv3, conv4, and conv5 and am regressing on the ground truth boxes in frame t --> t+tau.
I am using an L1 smooth loss for the tracking loss.
I am not linking detections across frames at the moment.
I am using a batch size of **32** (2 images per video, 16 videos = 32 frames total)
My initial lr is **1e-3**

and for now, I got mAP on full Imagenet VID validation as below： R-FCN one frame baseline： 68.6% vs paper 74.2% R-FCN one frame baseline+finetune on DT with 1e-4 lr： 69.7% vs paper 75.8%

and i think the main difference between us is the training data and the learning rate schedule,you can use the DET+VID dataset to train your network,and see how that improves your mAP.

and I am still trying to reach >70% val mAP.

the most important thing is I don't get it why the MATLAB version baseline got 74.2%,any idea？

Feynman27 commented 6 years ago

Great. Happy to hear someone else is building a Pytorch implementation! I've just started training with the alternating VID+DET heuristic. We'll see if that boosts the performance.

LiangXu123 commented 6 years ago

my result : R-FCN one frame baseline+finetune on DT with 1e-4 lr： 69.7% is got by using alternating sample VID or DET in each iteration,but i did not know how to improve my result further

LiangXu123 commented 6 years ago

and for now，i am trying to use alternating sample from VID or DET in each iteration everywhere，not only in finetune the DT from RFCN as the paper says

Feynman27 commented 6 years ago

I just trainined the R-FCN single-frame baseline using ImageNet VID+DET and reached a frame mAP on the ImageNet VID validation set of 70.3%. Still several percentage points away from the 74% paper result under the same conditions, but much better than just training on Imagenet VID.

My initial lr was 1e-3, decayed every 3 epochs up to 11 epochs.

Hopefully, initializing the D (&T loss) network with these weights will squeeze out another percentage point or two.

LiangXu123 commented 6 years ago

sure， in early time of my experiment， I also get RFCN single frame baseline mAP of 70.9%，but there is still a long way to get 74.6%，keep me posted with further experiment result！

123liluky commented 6 years ago

I am doing step 1 of Setup on Ubuntu16.04. While I compile caffe-rfcn downloaded from https://github.com/feichtenhofer/caffe-rfcn, error is: ./include/caffe/parallel.hpp(99): warning: type qualifier on return type is meaningless

/usr/local/include/boost/system/error_code.hpp:233:21: error: looser throw specifier for ‘virtual const char boost::system::error_category::std_category::name() const’ virtual const char name() const BOOST_NOEXCEPT ^ /usr/include/c++/5/system_error:77:21: error: overriding ‘virtual const char* std::_V2::error_category::name() const noexcept’ name() const noexcept = 0; ^ /usr/local/include/boost/system/error_code.hpp:243:37: error: looser throw specifier for ‘virtual std::error_condition boost::system::error_category::std_category::default_error_condition(int) const’ virtual std::error_condition default_error_condition( int ev ) const ^ /usr/include/c++/5/system_error:104:25: error: overriding ‘virtual std::error_condition std::_V2::error_category::default_error_condition(int) const noexcept’ default_error_condition(int i) const noexcept; ^ /usr/local/include/boost/system/error_code.hpp:245:21: error: looser throw specifier for ‘virtual bool boost::system::error_category::std_category::equivalent(int, const std::error_condition&) const’ virtual bool equivalent( int code, const std::error_condition & condition ) const ^ /usr/include/c++/5/system_error:107:14: error: overriding ‘virtual bool std::_V2::error_category::equivalent(int, const std::error_condition&) const noexcept’ equivalent(int __i, const error_condition& cond) const noexcept; ^ /usr/local/include/boost/system/error_code.hpp:247:21: error: looser throw specifier for ‘virtual bool boost::system::error_category::std_category::equivalent(const std::error_code&, int) const’ virtual bool equivalent( const std::error_code & code, int condition ) const ^ /usr/include/c++/5/system_error:110:14: error: overriding ‘virtual bool std::_V2::error_category::equivalent(const std::error_code&, int) const noexcept’ equivalent(const error_code& code, int i) const noexcept; ^ Makefile:591: recipe for target '.build_release/cuda/src/caffe/layers/correlation_layer.o' failed make: *** [.build_release/cuda/src/caffe/layers/correlation_layer.o] Error 1 Could you help me? My Makefile.config is: ######## Refer to http://caffe.berkeleyvision.org/installation.html ######## Contributions simplifying and improving our build system are welcome!

######## cuDNN acceleration switch (uncomment to build with cuDNN). USE_CUDNN := 1

######## CPU-only switch (uncomment to build without GPU support). ######## CPU_ONLY := 1

######## uncomment to disable IO dependencies and corresponding data layers USE_OPENCV := 1 USE_LEVELDB := 1 USE_LMDB := 1

######## uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary) ######## You should not set this flag if you will be reading LMDBs with any ######## possibility of simultaneous read and write ######## ALLOW_LMDB_NOLOCK := 1

######## Uncomment if you're using OpenCV 3 OPENCV_VERSION := 3

######## To customize your choice of compiler, uncomment and set the following. ######## N.B. the default for Linux is g++ and the default for OSX is clang++ CUSTOM_CXX := g++

######## CUDA directory contains bin/ and lib/ directories that we need. CUDA_DIR := /usr/local/cuda-8.0 ######## On Ubuntu 14.04, if cuda tools are installed via ######## "sudo apt-get install nvidia-cuda-toolkit" then use this instead: ######## CUDA_DIR := /usr

######## CUDA architecture setting: going with all of them. ######## For CUDA < 6.0, comment the *_50 lines for compatibility. CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_50,code=compute_50

######## BLAS choice: ######## atlas for ATLAS (default) ######## mkl for MKL ######## open for OpenBlas BLAS := atlas ######## Custom (MKL/ATLAS/OpenBLAS) include and lib directories. ######## Leave commented to accept the defaults for your choice of BLAS ######## (which should work)! ######## BLAS_INCLUDE := /path/to/your/blas ######## BLAS_LIB := /path/to/your/blas

######## Homebrew puts openblas in a directory that is not on the standard search path ######## BLAS_INCLUDE := $(shell brew --prefix openblas)/include ######## BLAS_LIB := $(shell brew --prefix openblas)/lib

######## This is required only if you will compile the matlab interface. ######## MATLAB directory should contain the mex binary in /bin. MATLAB_DIR := /usr/local/MATLAB/R2014b ######## MATLAB_DIR := /Applications/MATLAB_R2012b.app

######## NOTE: this is required only if you will compile the python interface. ######## We need to be able to find Python.h and numpy/arrayobject.h. ########PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/lib/python2.7/dist-packages/numpy/core/include ######## Anaconda Python distribution is quite popular. Include path: ######## Verify anaconda location, sometimes it's in root. ######## ANACONDA_HOME := $(HOME)/anaconda ######## PYTHON_INCLUDE := $(ANACONDA_HOME)/include \ ######## $(ANACONDA_HOME)/include/python2.7 \ ######## $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

######## Uncomment to use Python 3 (default is Python 2) ######## PYTHON_LIBRARIES := boost_python3 python3.5m ######## PYTHON_INCLUDE := /usr/include/python3.5m \ ######## /usr/lib/python3.5/dist-packages/numpy/core/include

######## We need to be able to find libpythonX.X.so or .dylib. PYTHON_LIB := /usr/lib ######## PYTHON_LIB := $(ANACONDA_HOME)/lib

######## Homebrew installs numpy in a non standard path (keg only) ######## PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.file)'))/include ######## PYTHON_LIB += $(shell brew --prefix numpy)/lib

######## Uncomment to support layers written in Python (will link against Python libs) ######## WITH_PYTHON_LAYER := 1

######## Whatever else you find you need goes here. INCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial/ ########$(PYTHON_INCLUDE) LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

######## If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies ######## INCLUDE_DIRS += $(shell brew --prefix)/include ######## LIBRARY_DIRS += $(shell brew --prefix)/lib

######## Uncomment to use pkg-config to specify OpenCV library paths. ######## (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.) ######## USE_PKG_CONFIG := 1

######## N.B. both build and distribute dirs are cleared on make clean BUILD_DIR := build DISTRIBUTE_DIR := distribute

######## Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171 ######## DEBUG := 1

######## The ID of the GPU that 'make runtest' will use to run unit tests. TEST_GPUID := 0

######## enable pretty build (comment to see full commands) Q ?= @

And I am using cuda8.0 cudnn4 matlab2014b. Thank you very much!

Cris-zj commented 6 years ago

Hi~can I ask some questions? how to get the rois_disp, bbox_targets_disp and bbox_loss_weights_disp which will be input into the network if I want to train the model in pytorch? thank you for your reply @Feynman27 @cc786537662

Feynman27 commented 6 years ago

The roi targets are predicted offsets of the boxes relative to your anchors: see Eq. 2 from this paper. The final bbox target layer should predict refined displacements relative to the initial bbox predictions from above. The bbox loss weights are used to mask out background rois so that only the foreground rois are used in the L1 bbox loss. There's also an "outside" weight (called lambda in the paper above) that weights the L1 bbox loss in the overall multi-task loss.

Here's a good pytorch implementation: https://github.com/jwyang/faster-rcnn.pytorch.

Cris-zj commented 6 years ago

Thank you for your reply~ I still don't understand the displacement part, I think the displacement in matlab code is so difficult to understand.

Cris-zj commented 6 years ago

Hi~I have known the displacement, thank you very much. I feel confused about the fine-tune with the tracking loss The number of the images input into the model is 1 or 2 during D(& T loss)? If it is 1, how to get loss between 2 images. If it is 2, what's the differences between D(& T loss) and D&T(tau) ? Thank you for your reply~~ @Feynman27 @cc786537662

Cris-zj commented 6 years ago

Hi, sorry about disturb you again. I chaned the tau to 10, the result is soooo bad. but the result with tau equaled to 1 is good. Have you run the code with tau equaled to 10? Please reply to me. the question troubled me so long. Thank you very much for your reply @Feynman27 @cc786537662 @feichtenhofer

qingzew commented 6 years ago

@Feynman27 @cc786537662 @Cris-zj can you share your code, I am trying to apply this code to my own data, but it's hard to me, i have no idea about matlab

HJW0522 commented 6 years ago

@Feynman27 @Cris-zj Hi, I want to ask you some questions. When I try to run the train scripts, I also cannot get the rois_disp, bbox_targets_disp and bbox_loss_weights_disp in input data. How did you solve this problem? Thank you~

imnaren142 commented 6 years ago

@zorrocai Hey....I am working on D&T in matlab.Have you tested the code in matlab?

zorrocai commented 6 years ago

@naren142 No, I didn't. I just try to rewrite D&T in pytorch.

zorrocai commented 6 years ago

Hi, every one. I have rewritten the D&T architecture in pytorch, but I found that there are so many tricks to achieve the original result in the paper, like data process "Linking tracklets to object tubes", and the pretrained RPN, RFCN and so on. I am very busy at the moment, if anyone has interests in this project, please contact with me. Maybe we can work together.

imnaren142 commented 6 years ago

@zorrocai hey can you please share your email??

zorrocai commented 6 years ago

@naren142 909241818@qq.com

feichtenhofer / Detect-Track

Training end-to-end from scratch #21