Matrix Nets: A New Deep Architecture for Object Detection - mAP of 47.8@0.5...0.95 on MS COCO,

WongKinYiu commented 5 years ago

Could this repo supports max pooling layer with different x, y strides. I would like to implement the state-of-the-art object detector. Thanks.

AlexeyAB commented 5 years ago

https://arxiv.org/abs/1908.04646v2

xNets can be applied to any backbone, similar to FPNs. ... We detect corners for objects of different sizes and aspect ratios using different matrix layers, and simplify the matching process by removing the embedding layer entirely and regressing the object centers directly. We show that KP-xNet outperforms all existing single-shot detectors by achieving 47.8% mAP on the MS COCO benchmark.

xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture.

AlexeyAB commented 5 years ago

@WongKinYiu

Could this repo supports max pooling layer with different x, y strides.

Is this the only necessary feature for the implementation of the Matrix Net? So we should have:

stride_x=
stride_y=

Or do you need something else?

WongKinYiu commented 5 years ago

@AlexeyAB Hello, Yes, but if it can support convolutional layer & average pooling layer with stride_x and stride_y is better. The paper seems use convolutional layers to do down-sampling.

AlexeyAB commented 5 years ago

@WongKinYiu Hi,

I added support for

[maxpool]
stride_x=2
stride_y=3
...

Try to make some network, and if it will work fine with increasing accuracy, I will add stride_x & stride_y for convolutional layer.

WongKinYiu commented 5 years ago

@AlexeyAB Hello, thank you very very much.

AlexeyAB commented 5 years ago

It seems that MatrixNet (different strides) and TridentNet (different dilations) are very promising approaches for generalizing different sizes and aspect ratios of objects.

AlexeyAB commented 5 years ago

@WongKinYiu Hi,

What progress?

I added stride_x=... stride_y=... for convolutional layer. So you can try to make MatrixNet.

[convolutional]
stride_x=2
stride_y=3
...

https://arxiv.org/pdf/1908.04646v2.pdf

2.1. Layer Generation .... The upper triangular layers are obtained by applying a series of shared 3x3 convolutions with stride 1x2 on the diagonal layers. Similarly, the bottom left layers are obtained using shared 3x3 convolutions with stride 2x1. The parameters are shared across all down-sampling convolutions to minimize the number of new parameters.

@WongKinYiu Do you understand:

Does it mean that some layers share their weights (as in TridentNet)?
What layers with which layers share their weight? (only layers with the same aspect-ratio, f.e. all conv-layers with w=4x h=1x share the same weights?)

63207472-b22a0a00-c0f9-11e9-9004-51bbeeb71b9d

WongKinYiu commented 5 years ago

@AlexeyAB Hello, for the version with different stride max pooling layers, now training 200k epochs. it needs 7~9 more days to finish training.

LukeAI commented 5 years ago

@WongKinYiu are you training a matrixnet with the original resnext-101 backbone? or something else?

AlexeyAB commented 5 years ago

@WongKinYiu Thanks, do you know about "shared 3x3 convolutions"? https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-526918433

WongKinYiu commented 5 years ago

@AlexeyAB Yes, I think it similar to TridentNet. In my understand, only layers with the same aspect-ratio share the same weights. But for max pooling version, there is no parameters.

WongKinYiu commented 5 years ago

@LukeAI Hello. I do not use big model like resnet-50, resnext-101... I only train the very small models.

LukeAI commented 5 years ago

which one? :) darknet-53 ?

AlexeyAB commented 5 years ago

@WongKinYiu Thanks.

You can try to train 3 models:

with [maxpool] stride_x=... stride_y=...
with [convolutional] stride_x=... stride_y=... with share_index=... in some layers, to get weights from conv-layer with this number (as in TridentNet)
then with [convolutional] stride_x=... stride_y=... antialiasing=1 with share_index=... in some layers

And compare results.

I added antialiasing=1 parameter for [convolutional] layer: https://github.com/AlexeyAB/darknet/issues/3672

WongKinYiu commented 5 years ago

@AlexeyAB Thank you for your advise. I will get free gpus after two weeks.

WongKinYiu commented 5 years ago

@AlexeyAB

maxpool version:

Model	+anchors	BFLOPs	mAP@0.5	mAP@0.5:0.95
original	-	5.32	48.7	25.7
matrix net	no	5.27	49.1	25.4
matrix net	yes	5.32	48.6	26.3

AlexeyAB commented 5 years ago

@WongKinYiu Thanks for results!

What does it mean anchors - no? Did you implement [yolo]-layers without anchors?
Can you share cfg-file?

WongKinYiu commented 5 years ago

@AlexeyAB

It means without increase number of anchors.

I can not access the cfg file now, becuz power supply of my office is going to be cut off tomorrow, i turned off my computer already.

AlexeyAB commented 5 years ago

@WongKinYiu Is Original - original MatrixNet from https://arxiv.org/abs/1908.04646v2 ?

WongKinYiu commented 5 years ago

@AlexeyAB

No, original means a yolov3-based model, without adding the feature proposed by MatrixNet.

You can see the figure in https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-529093568 original means a model with three yolo layers (3+3+3 = 9 anchors). and the model right hand side has nine yolo layers with different aspect ratios (MatrixNet), each yolo layer predict bbox of one anchor (1+1+1+1+1+1+1+1+1 = 9 anchors).

WongKinYiu commented 5 years ago

cfg files based on yolov3-tiny

yolov3-tiny_3l(15.778BFLOPs).cfg.txt yolov3-tiny_3l_maxpoolmatrixnet(15.565BFLOPs).cfg.txt yolov3-tiny_3l_maxpoolmatrixnet_addanchors(15.787BFLOPs).cfg.txt

for conv version, just replace maxpool layers by conv layers with shared weights.

AlexeyAB commented 5 years ago

@WongKinYiu Thanks. What mAP@0.5 did you get for yolov3-tiny_3l_maxpoolmatrixnet_addanchors(15.787BFLOPs).cfg.txt ?

AlexeyAB commented 5 years ago

@WongKinYiu

Did you anywhere meet original MatrixNet-backbone (not just yolov3/tiny with non-unoform strides), if yes - can you share it?

WongKinYiu commented 5 years ago

@AlexeyAB Hello,

I do not meet original MatrixNet-backbone.

In the paper, they only show the concept of MatrixNet and use it on CornerNet without telling details. But the concept can be used on YOLO.

WongKinYiu commented 5 years ago

@AlexeyAB i v updated the results of https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-529091355

AlexeyAB commented 5 years ago

@WongKinYiu Do you plane to try to use non-uniform strided x/y conv layers with shared weights?

WongKinYiu commented 5 years ago

@AlexeyAB Yes, i will get available gpus after 1~2 weeks.

And I would like to solve the problem https://github.com/AlexeyAB/darknet/issues/3708#issuecomment-528140698 first. But there are so many commits to be check.

AlexeyAB commented 4 years ago

@WongKinYiu Hi, did you try non-uniform strided x/y conv layers with shared weights?

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

yes, it is slightly better on mAP@0.5, but decreases mAP@0.5:0.95. i modified the architectures to make it has better results. but for my project, the FPS drops too much.

AlexeyAB commented 4 years ago

@WongKinYiu

Is MatrixNet - non-unoform conv-layers?
Is Modified MatrixNet - non-unoform maxpool-layers?

WongKinYiu commented 4 years ago

i apply concatenate (route layer) for the blocks with multiple in-degree.

WongKinYiu commented 4 years ago

And, for YOLO, we can simply add number of anchors to achieve high mAP@0.5:0.95.

AlexeyAB commented 4 years ago

@WongKinYiu Can you share the last cfg/weights-file?

WongKinYiu commented 4 years ago

sorry for that i can not share the cfg file of our backbone due to non-disclosure agreement. here are the cfg files of the head, i do not use share weights in my implementation.

implemented matrixnet - matrixnet-9anchors.txt
implemented modified matrixnet - mmatrixnet-9anchors.txt
implemented modified matrixnet with more anchors - mmatrixnet-25anchors.txt

AlexeyAB commented 4 years ago

@WongKinYiu Thanks!

Do you use light custom backbone - something like tinyv3/prn/...?
Did this mAP get on MS COCO?

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

I use Pelee-based backbone, the FPS is testing on GTX 1080ti, so I think it is about 40% faster than FPS on GTX titan X.
If using our new darknet reference + prn-based model, I thank the mAP will drop 3~5%, but the FPS will be doubled.
All mAP get on MS COCO test-dev set.

AlexeyAB commented 4 years ago

@WongKinYiu Hi, I just temporary changed [yolo] layer for training. Now it uses ignore_thresh only if class_id's matched: https://github.com/AlexeyAB/darknet/commit/e6486ab594e877e0b870eab6788de9e888c35840#diff-180a7a56172e12a8b79e41ec95ae569dR316

So you can try to train one of model with these changes.

WongKinYiu commented 4 years ago

@AlexeyAB thanks, i will get a free gpu at Thursday.

Arcitec commented 4 years ago

Very interesting. Any idea why mAP did not improve when using matrix architecture?

AlexeyAB commented 4 years ago

@WongKinYiu @VideoPlayerCode May be there are used not the most suitable anchors for different [yolo] layers.

May be we should use synthetic anchors like

anchors = 10,10, 30,15, 15,30, 30,30, 60,30, 30,60, 60,60, 120,60, 60,120, 120,120, 240,120, 120,240, 320,320 num=13

instead of anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 num=9

AlexeyAB commented 4 years ago

@WongKinYiu I added model yolo_v3_tiny_pan3 matrix_gaussian aa_ae_mixup.cfg

based on yolov3-tiny, so can be trained by using yolov3-tiny.conv.15 pre-trained weights
- pan3 block
- modified your matrix head mmatrixnet-25anchors.txt https://github.com/AlexeyAB/darknet/issues/3772#issuecomment-544242134
by using [Gaussian_yolo] layer instead of [yolo]
with AntiAliasing, Assisted Excitation, Mixup and random bilateral blur

https://github.com/AlexeyAB/darknet/issues/4147#issuecomment-548481948

WongKinYiu commented 4 years ago

i ll get free gpus after 1/24 i think.

glenn-jocher commented 4 years ago

@WongKinYiu @AlexeyAB hi guys. I'm going to try to implement matrix nets in our https://github.com/ultralytics/yolov3 repo.

To be able to use these updated cfg files it looks like I need to parse stride_x and stride_y info and then implement these special convolutions. Just thinking out loud here I need to see if PyTorch has this capability, as right now we pass a single stride scalar to the Conv2d module. Maybe I can do this by passing an array of strides instead.

        if mdef['type'] == 'convolutional':
            bn = int(mdef['batch_normalize'])
            filters = int(mdef['filters'])
            kernel_size = int(mdef['size'])
            pad = (kernel_size - 1) // 2 if int(mdef['pad']) else 0
            modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
                                                   out_channels=filters,
                                                   kernel_size=kernel_size,
                                                   stride=int(mdef['stride']),
                                                   padding=pad,
                                                   bias=not bn))

WongKinYiu commented 4 years ago

@glenn-jocher Hello,

The stride in Pytorch can be tuple.

glenn-jocher commented 4 years ago

@WongKinYiu right, it looks like I should update stride=int(mdef['stride']) to this. I think this is the proper order right? Do the upsample layers need to be modified also or is the only change required in the Conv2d() layers??

stride=(int(mdef['stride_y']), int(mdef['stride_x']))

https://pytorch.org/docs/stable/nn.html#conv2d

WongKinYiu commented 4 years ago

i think modified conv2d is enough for matrixnet.

glenn-jocher commented 4 years ago

@WongKinYiu ok, I've updated our https://github.com/ultralytics/yolov3 repo to handle cfg [convolutional] layers with either stride or stride_x and stride_y. What is the best matrixnets cfg file you would recommend to try as a yolov3-spp.cfg replacement on COCO?

Our current yolov3-spp.cfg COCO results on ultralytics/yolov3 are about 0.40mAP@0.5:0.95 and 0.60mAP@0.5 (see https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-552271610), so this is the benchmark I want to test against and improve.

WongKinYiu commented 4 years ago

@glenn-jocher Hello,

I haven't design matrixnets for big models. I thank I need 2~3 days to design it.

By the way, I have cfg file for another model, but it can not be released currently. I think it can easily get better result than https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-552271610 using your training skill. (but it need a little bit more inference time than YOLOv3-SPP on Darknet) If your are interested in it, please tell me then I can send it to you.