Closed bhack closed 7 years ago
I am still debugging ResNet. I will think about the direction after getting done with that and see how much the AP can reach on COCO. However, ResNext is probably not the next step I want to explore because it seems not help AP much (in the original paper), I would probably start with FPN first if I want to have better AP.
Ender X. Chen
Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
On Mon, Mar 27, 2017 at 5:12 AM, bhack notifications@github.com wrote:
Do you plan to experiment with ResNext https://github.com/wenxinxu/ResNeXt-in-tensorflow?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEaGKG6HA9Ow-JaKooRKA1hCOnJm1ks5rp32HgaJpZM4Mp_Nf .
You can see ablative of ResNext in https://arxiv.org/abs/1703.06870
See Table n. 3
Oh yes I saw that coming out, it uses additional supervision from masks as well. Not an exact apple-to-apple comparison for bbox annotation. The old number in their original paper is very close for Resnet 101. The variation of AP on COCO is smaller but on VOC the variation with different random seeds can be as high as 1%.
Ender X. Chen
Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
On Mon, Mar 27, 2017 at 5:27 AM, bhack notifications@github.com wrote:
See Table n. 3
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289400847, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEeoao5llIfS7q_QZQIGAMg117w4yks5rp4D_gaJpZM4Mp_Nf .
Seems that they have analyzed distinc contributions: using RoIAlign (+1.1 APbb), multitask training (+0.9 APbb), and ResNeXt-101 (+1.6 APbb). All top performing backbones are using FPN.
Yes, but the 1.6 addition is on top of the multi-task learning, 38.2->39.8, what I mean is, they do not give an improvement of ResNext alone: how much does it gain from 36.2/37.3 the FPN numbers (with or without RoIAlign).
Also I think RoIAlign is just crop and resize from tensorflow? Any idea?
Ender X. Chen
Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
On Mon, Mar 27, 2017 at 5:38 AM, bhack notifications@github.com wrote:
Seems that they have analyzed distinc contributions: using RoIAlign (+1.1 APbb), multitask training (+0.9 APbb), and ResNeXt-101 (+1.6 APbb). All top performing backbones are using FPN.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289403605, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEdjcrPx0DCpHZLIaw0l1q4p7sZ6Vks5rp4OVgaJpZM4Mp_Nf .
The problem is that we don't have an explicit entry in Table 3 for Faster R-CNN, RoIAlign with ResNext FPN backbone to compare. But from the table just Faster R-CNN with ResNet and RoiAlign (without Multitask) seems to improve. RoiAlign is defined in "detail" in section 3. and seems that uses bilinear interpolation.
They say they will release the code over summer. I believe it will be in caffe2. So we will see. :)
Anyway crop and resize does do better in my experiment, and it uses bilinear interpolation.
Ender X. Chen
Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
On Mon, Mar 27, 2017 at 6:01 AM, bhack notifications@github.com wrote:
The problem is that we don't have an explicit entry in Table 3 for Faster R-CNN, RoIAlign with ResNext FPN backbone to compare. But from the table just Faster R-CNN with ResNet and RoiAlign (without Multitask) seems to improve. RoiAlign is defined in "detail" in section 3. and seems that use bilinear interpolation.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/27#issuecomment-289408908, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTwEfS6X9_MFy25xKz-lxalOypsPIMFks5rp4jtgaJpZM4Mp_Nf .
Fair group.. probably Torch/Pytorch
Yeah anything but tensorflow.
Figure 5 table (C) compare align and interpolation contribution of different pooling layers.
@endernewton I think RoIAlign may be has some different with crop_and_resize. RoiAlign not only use bilinear interpolation, it also use x/16 instead of [x/16]. However, you use tf.ceil to calculate the new width and height for crop_and_resize, I think it will lead to quantizes.
@philokey actually I am using x/16.. the ceil thing is to measure the size of the entire conv5 feature map because crop_and_resize only takes relative size.. hmm now you actually reminds me that maybe the feature map in resnet101 can be different from vgg16. Needs some verification there, or maybe i should just take the size of the featuremap directly so it looks more clear.
@endernewton Why do you use ceil(x / 16 - 1) * 16
, when x % 16 == 0
, it is seems wrong.
@philokey "x" there is height or width factor for the entire image, not the locations.. please double check.
Just to notify https://github.com/CharlesShang/FastMaskRCNN /cc @CharlesShang
Thanks for the pointer, if you are interested, you can try to incorporate that and submit a pull request. Implementing Mask RCNN is not in my bucket list, at least for now.
Could you be interested for the pyramid https://github.com/CharlesShang/FastMaskRCNN/blob/master/libs/nets/pyramid_network.py?
I will do that in some time. As mentioned above, once I am done with ResNet part, I will probably implement FPN if further improving AP is needed. Implementing that will likely take more time because their official code is not released, so need to bug the authors if we want to clarify the details. So it will take some more rounds.
I have attempted to implement FPN with res50. I'm currently struggling to use the res50 imagenet weights to initialize the network. If anyone gets a chance the repo is here, but nearly all the changes took place in pyramid.py Let me know if you have any suggestions.
why do we need "-1" for the relative location computation? i.e., height = (conv5_height - 1)*16, then x_factor = x/height
I have attempted to implement FPN with res50. I'm currently struggling to use the res50 imagenet weights to initialize the network. If anyone gets a chance the repo is here, but nearly all the changes took place in pyramid.py Let me know if you have any suggestions.
Hi, have you finished your implement of FPN?
Do you plan to experiment with ResNext?