endernewton / tf-faster-rcnn

Tensorflow Faster RCNN for Object Detection
https://arxiv.org/pdf/1702.02138.pdf
MIT License
3.65k stars 1.57k forks source link

ResNext #27

Closed bhack closed 7 years ago

bhack commented 7 years ago

Do you plan to experiment with ResNext?

endernewton commented 7 years ago

I am still debugging ResNet. I will think about the direction after getting done with that and see how much the AP can reach on COCO. However, ResNext is probably not the next step I want to explore because it seems not help AP much (in the original paper), I would probably start with FPN first if I want to have better AP.


Ender X. Chen

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

On Mon, Mar 27, 2017 at 5:12 AM, bhack notifications@github.com wrote:

Do you plan to experiment with ResNext https://github.com/wenxinxu/ResNeXt-in-tensorflow?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEaGKG6HA9Ow-JaKooRKA1hCOnJm1ks5rp32HgaJpZM4Mp_Nf .

bhack commented 7 years ago

You can see ablative of ResNext in https://arxiv.org/abs/1703.06870

bhack commented 7 years ago

See Table n. 3

endernewton commented 7 years ago

Oh yes I saw that coming out, it uses additional supervision from masks as well. Not an exact apple-to-apple comparison for bbox annotation. The old number in their original paper is very close for Resnet 101. The variation of AP on COCO is smaller but on VOC the variation with different random seeds can be as high as 1%.


Ender X. Chen

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

On Mon, Mar 27, 2017 at 5:27 AM, bhack notifications@github.com wrote:

See Table n. 3

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289400847, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEeoao5llIfS7q_QZQIGAMg117w4yks5rp4D_gaJpZM4Mp_Nf .

bhack commented 7 years ago

Seems that they have analyzed distinc contributions: using RoIAlign (+1.1 APbb), multitask training (+0.9 APbb), and ResNeXt-101 (+1.6 APbb). All top performing backbones are using FPN.

endernewton commented 7 years ago

Yes, but the 1.6 addition is on top of the multi-task learning, 38.2->39.8, what I mean is, they do not give an improvement of ResNext alone: how much does it gain from 36.2/37.3 the FPN numbers (with or without RoIAlign).

Also I think RoIAlign is just crop and resize from tensorflow? Any idea?


Ender X. Chen

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

On Mon, Mar 27, 2017 at 5:38 AM, bhack notifications@github.com wrote:

Seems that they have analyzed distinc contributions: using RoIAlign (+1.1 APbb), multitask training (+0.9 APbb), and ResNeXt-101 (+1.6 APbb). All top performing backbones are using FPN.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289403605, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEdjcrPx0DCpHZLIaw0l1q4p7sZ6Vks5rp4OVgaJpZM4Mp_Nf .

bhack commented 7 years ago

The problem is that we don't have an explicit entry in Table 3 for Faster R-CNN, RoIAlign with ResNext FPN backbone to compare. But from the table just Faster R-CNN with ResNet and RoiAlign (without Multitask) seems to improve. RoiAlign is defined in "detail" in section 3. and seems that uses bilinear interpolation.

endernewton commented 7 years ago

They say they will release the code over summer. I believe it will be in caffe2. So we will see. :)

Anyway crop and resize does do better in my experiment, and it uses bilinear interpolation.


Ender X. Chen

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

On Mon, Mar 27, 2017 at 6:01 AM, bhack notifications@github.com wrote:

The problem is that we don't have an explicit entry in Table 3 for Faster R-CNN, RoIAlign with ResNext FPN backbone to compare. But from the table just Faster R-CNN with ResNet and RoiAlign (without Multitask) seems to improve. RoiAlign is defined in "detail" in section 3. and seems that use bilinear interpolation.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289408908, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEfS6X9_MFy25xKz-lxalOypsPIMFks5rp4jtgaJpZM4Mp_Nf .

bhack commented 7 years ago

Fair group.. probably Torch/Pytorch

endernewton commented 7 years ago

Yeah anything but tensorflow.

bhack commented 7 years ago

Figure 5 table (C) compare align and interpolation contribution of different pooling layers.

philokey commented 7 years ago

@endernewton I think RoIAlign may be has some different with crop_and_resize. RoiAlign not only use bilinear interpolation, it also use x/16 instead of [x/16]. However, you use tf.ceil to calculate the new width and height for crop_and_resize, I think it will lead to quantizes.

endernewton commented 7 years ago

@philokey actually I am using x/16.. the ceil thing is to measure the size of the entire conv5 feature map because crop_and_resize only takes relative size.. hmm now you actually reminds me that maybe the feature map in resnet101 can be different from vgg16. Needs some verification there, or maybe i should just take the size of the featuremap directly so it looks more clear.

philokey commented 7 years ago

@endernewton Why do you use ceil(x / 16 - 1) * 16, when x % 16 == 0, it is seems wrong.

endernewton commented 7 years ago

@philokey "x" there is height or width factor for the entire image, not the locations.. please double check.

bhack commented 7 years ago

Just to notify https://github.com/CharlesShang/FastMaskRCNN /cc @CharlesShang

endernewton commented 7 years ago

Thanks for the pointer, if you are interested, you can try to incorporate that and submit a pull request. Implementing Mask RCNN is not in my bucket list, at least for now.

bhack commented 7 years ago

Could you be interested for the pyramid https://github.com/CharlesShang/FastMaskRCNN/blob/master/libs/nets/pyramid_network.py?

endernewton commented 7 years ago

I will do that in some time. As mentioned above, once I am done with ResNet part, I will probably implement FPN if further improving AP is needed. Implementing that will likely take more time because their official code is not released, so need to bug the authors if we want to clarify the details. So it will take some more rounds.

zacwellmer commented 7 years ago

I have attempted to implement FPN with res50. I'm currently struggling to use the res50 imagenet weights to initialize the network. If anyone gets a chance the repo is here, but nearly all the changes took place in pyramid.py Let me know if you have any suggestions.

lichengunc commented 7 years ago

why do we need "-1" for the relative location computation? i.e., height = (conv5_height - 1)*16, then x_factor = x/height

chang010453 commented 4 years ago

I have attempted to implement FPN with res50. I'm currently struggling to use the res50 imagenet weights to initialize the network. If anyone gets a chance the repo is here, but nearly all the changes took place in pyramid.py Let me know if you have any suggestions.

Hi, have you finished your implement of FPN?