Detect small objects in RetinaNet, with bbox ~10x20 pixels - using CONV1 - doesn't fit the RetinaNet code

EinavNamer commented 6 years ago

This issue202 is facing a similar problem. There, he/she suggested to up sample the input image, which doesn't seem reasonable to me.

My solution - is to use the higher Feature Pyramid layers, including Conv1 & Conv2. So, in order to detect small objects using Retina-Net-FPN-ResNet50, in my custom data, I created my own config file. Here I used:

FPN: FPN_ON: True MULTILEVEL_RPN: True RPN_MAX_LEVEL: 5 RPN_MIN_LEVEL: 1 COARSEST_STRIDE: 128 EXTRA_CONV_LEVELS: False

In FPN.py it is mention that: % Lowest and highest pyramid levels in the backbone network. For FPN, we assume % that all networks have 5 spatial reductions, each by a factor of 2. Level 1 % would correspond to the input image, hence it does not make sense to use it. LOWEST_BACKBONE_LVL = 2 # E.g., "conv2"-like level HIGHEST_BACKBONE_LVL = 5 # E.g., "conv5"-like level

However, conv1 in ResNet50 is down sampled twice (stride=2 in conv & maxPool), therefore differ from the input image: p = model.Conv('data', 'conv1', 3, 64, 7, pad=3, stride=2, no_bias=1) p = model.AffineChannel(p, 'res_conv1_bn', dim=64, inplace=True) p = model.Relu(p, p) p = model.MaxPool(p, 'pool1', kernel=3, pad=1, stride=2)

see https://github.com/facebookresearch/Detectron/blob/master/lib/modeling/ResNet.py#L94

Run results:

When using my config file, I get: File "/home/einav/Projects/detectron/lib/modeling/FPN.py", line 47, in add_fpn_ResNet50_conv5_body model, ResNet.add_ResNet50_conv5_body, fpn_level_info_ResNet50_conv5 File "/home/einav/Projects/detectron/lib/modeling/FPN.py", line 105, in add_fpn_onto_conv_body model, fpn_level_info_func() File "/home/einav/Projects/detectron/lib/modeling/FPN.py", line 163, in add_fpn lateral_input_blobs[i + 1], # lateral blob Process finished with exit code 1

My Question:

What do you suggest in order to find small objects, with regard to: (1) min bbox is 32pixels, so big for me. (2) Can I Create the ResNet50 conv1 without the 2 down-sampling stages?

System information

Ubuntu 16, Single GPU, python 2.7.. in general - training & testing RetinaNet over COCO successfully.

Thanks a lot!!! Einav

xmyqsh commented 6 years ago

(Recommended)try to use or add smaller anchor scale，the stride of P3 is 8，you could try to set the mini-bbox to 8 pixel
add P2
(not recommended) add P1 before first pool op

EinavNamer commented 6 years ago

(1) & (2) - OK , working on it..
(3) I don't get what u mean. Where exactly? there is a "stride=2" twice before P1 (first pyramid layer) is created. p = model.Conv('data', 'conv1', 3, 64, 7, pad=3, stride=2, no_bias=1) p = model.AffineChannel(p, 'res_conv1_bn', dim=64, inplace=True) p = model.Relu(p, p) p = model.MaxPool(p, 'pool1', kernel=3, pad=1, stride=2)

How can "conv1"-like layer correspond to the input image? we surely lose small details when down sampling. Thanks, Einav

xmyqsh commented 6 years ago

If the number of default number changed, the hyper parameters of focal loss should better be adjusted.

xmyqsh commented 6 years ago

You could add P0 which has the same resolution of the original input image.

While, I think there is no need to do P0 and P1.

Anyway, you could have a try.

rbruga commented 6 years ago

@EinavNamer, I am trying to address a similar issue and you might be able to assist. I want to detect small objects in large images and I have observed that changing TEST.SCALE https://github.com/facebookresearch/Detectron/blob/a026d7753512d7f8c1b92235c89e02c38f7b1cea/configs/12_2017_baselines/e2e_mask_rcnn_X-101-32x8d-FPN_2x.yaml#L48 in e2e_mask_rcnn_X_101_32d8_FPN to larger values allows me to detect smaller objects for the same input image during inference (I am not retraining the model). My hypothesis is that there is less resizing happening to the original image (hence higher resolution) but I am having a hard time understanding what part of the network is allowing to customize the input size. If you look at the TRAIN.SCALE and TRAIN.MAX the model has been trained for 800x1333 inputs. Any insights?

EinavNamer commented 6 years ago

No insights at the moment, since I'm dealing with a GPU out-of-memory issue at the moment, after changing the RetinaNet architecture, and added P2 as suggested.

When I trained RetinaNet with FPN: RPN_MAX_LEVEL: 7, RPN_MIN_LEVEL: 3 all worked well. (single GPU, GeForce GTX 1070)

Now I'm trying to run the train_net.py with the following config: FPN: RPN_MAX_LEVEL: 5 RPN_MIN_LEVEL: 2

and I get the following out-of-memory error: terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at context_gpu.cu:343] error == cudaSuccess. 2 vs 0. Error at: ...caffe2/core/context_gpu.cu:343: out of memory Error from operator:

It crashes at "def RunNet(name, num_iter=1, allow_fail=False):"

Question:

What can I do more in order to reduce GPU memory use? I already changed "BATCH_SIZE_PER_IM:" to 16 , instead of 64.
Is there some kind of manual for the CONFIG parameters? what does 'IMS_PER_BATCH': 2 mean? RPN_BATCH_SIZE_PER_IM ? TRAIN.SCALE, TEST.SCALE?

Thanks a lot!! Einav

xmyqsh commented 6 years ago

@EinavNamer P2 will consume lots of memory. Your small object is 10x20 pixels, P3 will handle it well. Try to change these parameters: RPN_MAX_LEVEL: 7 RPN_MIN_LEVEL: 3 COARSEST_STRIDE: 128 SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 4 To: RPN_MAX_LEVEL: 8 depend on the size of your large objects RPN_MIN_LEVEL: 3 COARSEST_STRIDE: 256 depend on RPN_MAX_LEVEL SCALES_PER_OCTAVE: 3
ANCHOR_SCALE: 2

If the size of large objects is not too large, you could also change TRAIN.SCALE and TEST.SCALE smaller. And adapt the above parameter accordingly. These will be a better choice.

If memory is limited, try to change IMS_PER_BATCH, NUM_CONVS, ASPECT_RATIOS, SCALES_PER_OCTAVE, TRAIN.SCALE, TEST.SCALE, RPN_MAX_LEVEL, RPN_MIN_LEVEL, COARSEST_STRIDE

ersanliqiao commented 6 years ago

@xmyqsh Thank you very much for your advice!

facebookresearch / Detectron