Open WongKinYiu opened 5 years ago
https://arxiv.org/abs/1908.04646v2
xNets can be applied to any backbone, similar to FPNs. ... We detect corners for objects of different sizes and aspect ratios using different matrix layers, and simplify the matching process by removing the embedding layer entirely and regressing the object centers directly. We show that KP-xNet outperforms all existing single-shot detectors by achieving 47.8% mAP on the MS COCO benchmark.
xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture.
@WongKinYiu
Could this repo supports max pooling layer with different x, y strides.
Is this the only necessary feature for the implementation of the Matrix Net? So we should have:
stride_x=
stride_y=
Or do you need something else?
@AlexeyAB Hello, Yes, but if it can support convolutional layer & average pooling layer with stride_x and stride_y is better. The paper seems use convolutional layers to do down-sampling.
@WongKinYiu Hi,
I added support for
[maxpool]
stride_x=2
stride_y=3
...
Try to make some network, and if it will work fine with increasing accuracy, I will add stride_x & stride_y for convolutional layer.
@AlexeyAB Hello, thank you very very much.
It seems that MatrixNet (different strides) and TridentNet (different dilations) are very promising approaches for generalizing different sizes and aspect ratios of objects.
@WongKinYiu Hi,
What progress?
I added stride_x=... stride_y=... for convolutional layer. So you can try to make MatrixNet.
[convolutional]
stride_x=2
stride_y=3
...
https://arxiv.org/pdf/1908.04646v2.pdf
2.1. Layer Generation .... The upper triangular layers are obtained by applying a series of shared 3x3 convolutions with stride 1x2 on the diagonal layers. Similarly, the bottom left layers are obtained using shared 3x3 convolutions with stride 2x1. The parameters are shared across all down-sampling convolutions to minimize the number of new parameters.
@WongKinYiu Do you understand:
w=4x h=1x
share the same weights?)@AlexeyAB Hello, for the version with different stride max pooling layers, now training 200k epochs. it needs 7~9 more days to finish training.
@WongKinYiu are you training a matrixnet with the original resnext-101 backbone? or something else?
@WongKinYiu Thanks, do you know about "shared 3x3 convolutions"? https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-526918433
@AlexeyAB Yes, I think it similar to TridentNet. In my understand, only layers with the same aspect-ratio share the same weights. But for max pooling version, there is no parameters.
@LukeAI Hello. I do not use big model like resnet-50, resnext-101... I only train the very small models.
which one? :) darknet-53 ?
@WongKinYiu Thanks.
You can try to train 3 models:
with [maxpool] stride_x=... stride_y=...
with [convolutional] stride_x=... stride_y=...
with share_index=...
in some layers, to get weights from conv-layer with this number (as in TridentNet)
then with [convolutional] stride_x=... stride_y=... antialiasing=1
with share_index=...
in some layers
And compare results.
I added antialiasing=1
parameter for [convolutional]
layer: https://github.com/AlexeyAB/darknet/issues/3672
@AlexeyAB Thank you for your advise. I will get free gpus after two weeks.
@AlexeyAB
maxpool version:
Model | +anchors | BFLOPs | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|
original | - | 5.32 | 48.7 | 25.7 |
matrix net | no | 5.27 | 49.1 | 25.4 |
matrix net | yes | 5.32 | 48.6 | 26.3 |
@WongKinYiu Thanks for results!
@AlexeyAB
It means without increase number of anchors.
I can not access the cfg file now, becuz power supply of my office is going to be cut off tomorrow, i turned off my computer already.
@WongKinYiu Is Original - original MatrixNet from https://arxiv.org/abs/1908.04646v2 ?
@AlexeyAB
No, original means a yolov3-based model, without adding the feature proposed by MatrixNet.
You can see the figure in https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-529093568 original means a model with three yolo layers (3+3+3 = 9 anchors). and the model right hand side has nine yolo layers with different aspect ratios (MatrixNet), each yolo layer predict bbox of one anchor (1+1+1+1+1+1+1+1+1 = 9 anchors).
cfg files based on yolov3-tiny
yolov3-tiny_3l(15.778BFLOPs).cfg.txt yolov3-tiny_3l_maxpoolmatrixnet(15.565BFLOPs).cfg.txt yolov3-tiny_3l_maxpoolmatrixnet_addanchors(15.787BFLOPs).cfg.txt
for conv version, just replace maxpool layers by conv layers with shared weights.
@WongKinYiu Thanks. What mAP@0.5 did you get for yolov3-tiny_3l_maxpoolmatrixnet_addanchors(15.787BFLOPs).cfg.txt ?
@WongKinYiu
Did you anywhere meet original MatrixNet-backbone (not just yolov3/tiny with non-unoform strides), if yes - can you share it?
@AlexeyAB Hello,
I do not meet original MatrixNet-backbone.
In the paper, they only show the concept of MatrixNet and use it on CornerNet without telling details. But the concept can be used on YOLO.
@AlexeyAB i v updated the results of https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-529091355
@WongKinYiu Do you plane to try to use non-uniform strided x/y conv layers with shared weights?
@AlexeyAB Yes, i will get available gpus after 1~2 weeks.
And I would like to solve the problem https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-528140698 first. But there are so many commits to be check.
@WongKinYiu Hi, did you try non-uniform strided x/y conv layers with shared weights?
@AlexeyAB Hello,
yes, it is slightly better on mAP@0.5, but decreases mAP@0.5:0.95. i modified the architectures to make it has better results. but for my project, the FPS drops too much.
@WongKinYiu
Is MatrixNet - non-unoform conv-layers?
Is Modified MatrixNet - non-unoform maxpool-layers?
i apply concatenate (route layer) for the blocks with multiple in-degree.
And, for YOLO, we can simply add number of anchors to achieve high mAP@0.5:0.95.
@WongKinYiu Can you share the last cfg/weights-file?
sorry for that i can not share the cfg file of our backbone due to non-disclosure agreement. here are the cfg files of the head, i do not use share weights in my implementation.
implemented matrixnet - matrixnet-9anchors.txt
implemented modified matrixnet - mmatrixnet-9anchors.txt
implemented modified matrixnet with more anchors - mmatrixnet-25anchors.txt
@WongKinYiu Thanks!
@AlexeyAB Hello,
@WongKinYiu Hi,
I just temporary changed [yolo] layer for training.
Now it uses ignore_thresh
only if class_id's matched: https://github.com/AlexeyAB/darknet/commit/e6486ab594e877e0b870eab6788de9e888c35840#diff-180a7a56172e12a8b79e41ec95ae569dR316
So you can try to train one of model with these changes.
@AlexeyAB thanks, i will get a free gpu at Thursday.
Very interesting. Any idea why mAP did not improve when using matrix architecture?
@WongKinYiu @VideoPlayerCode May be there are used not the most suitable anchors for different [yolo] layers.
May be we should use synthetic anchors like
anchors = 10,10, 30,15, 15,30, 30,30, 60,30, 30,60, 60,60, 120,60, 60,120, 120,120, 240,120, 120,240, 320,320
num=13
instead of
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
num=9
@WongKinYiu I added model yolo_v3_tiny_pan3 matrix_gaussian aa_ae_mixup.cfg
yolov3-tiny.conv.15
pre-trained weightsmmatrixnet-25anchors.txt
https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-544242134[Gaussian_yolo]
layer instead of [yolo]
https://github.com/AlexeyAB/darknet/issues/4147#issuecomment-548481948
i ll get free gpus after 1/24 i think.
@WongKinYiu @AlexeyAB hi guys. I'm going to try to implement matrix nets in our https://github.com/ultralytics/yolov3 repo.
To be able to use these updated cfg files it looks like I need to parse stride_x
and stride_y
info and then implement these special convolutions. Just thinking out loud here I need to see if PyTorch has this capability, as right now we pass a single stride scalar to the Conv2d module. Maybe I can do this by passing an array of strides instead.
if mdef['type'] == 'convolutional':
bn = int(mdef['batch_normalize'])
filters = int(mdef['filters'])
kernel_size = int(mdef['size'])
pad = (kernel_size - 1) // 2 if int(mdef['pad']) else 0
modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
out_channels=filters,
kernel_size=kernel_size,
stride=int(mdef['stride']),
padding=pad,
bias=not bn))
@glenn-jocher Hello,
The stride in Pytorch can be tuple.
@WongKinYiu right, it looks like I should update stride=int(mdef['stride'])
to this. I think this is the proper order right? Do the upsample
layers need to be modified also or is the only change required in the Conv2d()
layers??
stride=(int(mdef['stride_y']), int(mdef['stride_x']))
i think modified conv2d is enough for matrixnet.
@WongKinYiu ok, I've updated our https://github.com/ultralytics/yolov3 repo to handle cfg [convolutional] layers with either stride
or stride_x
and stride_y
. What is the best matrixnets cfg file you would recommend to try as a yolov3-spp.cfg replacement on COCO?
Our current yolov3-spp.cfg COCO results on ultralytics/yolov3 are about 0.40mAP@0.5:0.95 and 0.60mAP@0.5 (see https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-552271610), so this is the benchmark I want to test against and improve.
@glenn-jocher Hello,
I haven't design matrixnets for big models. I thank I need 2~3 days to design it.
By the way, I have cfg file for another model, but it can not be released currently. I think it can easily get better result than https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-552271610 using your training skill. (but it need a little bit more inference time than YOLOv3-SPP on Darknet) If your are interested in it, please tell me then I can send it to you.
@WongKinYiu i would have a look
@glenn-jocher Hello,
https://github.com/ruinmessi/ASFF
They use
Then AP is improve 5.8%
With their proposed ASFF AP is improve 7.7%
After apply
AP is improve 9.4%
Their code is written in PyTorch, maybe you can add these training trick to your repo.
Could this repo supports max pooling layer with different x, y strides. I would like to implement the state-of-the-art object detector. Thanks.