CenterNet : Objects as Points — 2x Better Than Yolo V3 in Speed and +4.4% Coco AP

laclouis5 commented 5 years ago

CenterNet

Objects as Points seems to achieve a good speed-accuracy tradeoff, better than Yolo v3, and probably better than CornerNet (#3229).

GitHub repo here.

Like CornerNet this one works without anchor boxes (nor NMS) and can regress many other properties with such as 3D location and pose estimation.

May be interesting to test this new detection head with Darknet backbones + PAN instead of Hourglass/DLA.

Speed-accuracy Trade Off (Titan Xp)

Speed-accuracy-trade-off

Coco Challenge State of the Art Networks

Coco-challenge-comp

Different Backbones for Speed-Accuracy Tradeoff

Backbone-comp

Yolo-comp

AlexeyAB commented 5 years ago

There is already MatrixNet in the Roadmap, that is faster and more accurate than CornerNet: https://github.com/AlexeyAB/darknet/issues/3772

Roadmap: https://github.com/AlexeyAB/darknet/projects/1

AlexeyAB commented 4 years ago

I added something like CenterNet: https://github.com/AlexeyAB/darknet/issues/3229#issuecomment-569412122

laclouis5 commented 4 years ago

Thank you for this implementation!

Two different papers exist for CenterNet: Key-point Triplets (the one you implemented) and Objects as Points.

I tried the second one with the author's repo but results were not as good as expected on my personal dataset, Yolo v3 Tiny Pan3 is still more accurate and faster.

I'll post complete results in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673 as usual in a week or two.

AlexeyAB commented 4 years ago

@laclouis5

Did you try CenterNet dla-34 512x512 ? Can you add results to your table? https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673

Also try to train Yolo v3 Tiny Pan3 with pre-trained weights: https://drive.google.com/file/d/18v36esoXCh-PsOKwyP2GWrpYDptDY8Zf/view?usp=sharing

laclouis5 commented 4 years ago

@AlexeyAB

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

AlexeyAB commented 4 years ago

@laclouis5

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

No. You can use any of these files.

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

Thanks. It will also be interesting to compare the speed (FPS) of models: CenterNet dla-34 512x512 vs CenterNet resnet-18 512x512 vs CSPResNeXt50-PANet-SPP.

Since it seems CenterNet dla-34 512x512 is slower than stated: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/1#issuecomment-569684933

laclouis5 commented 4 years ago

@AlexeyAB

Sure, I'll add FPS for all networks including CenterNet. Which command should I use to compute FPS precisely using Darknet framework? I run demo with -dont_show then average FPS?

AlexeyAB commented 4 years ago

I run demo with -dont_show then average FPS?

Yes, just run ./darknet detector demo ... test.mp4 -dont_show by using videofile (no video-camera)

laclouis5 commented 4 years ago

@AlexeyAB I just added new results and FPS in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673.

I will train Yolo v3 Spp Pan Scale with pre-trained weights when I have GPU time.

AlexeyAB commented 4 years ago

@laclouis5 Thanks!

So on GeForce GTX 1060 you get:

CenterNet dla-34 512x512 - 25 FPS
Yolo V3 CSR Spp Panet 544x544 - 18 FPS (so should be ~20 FPS 512x512)

What OS, CUDA and cuDNN versions did you use for Yolov3 and CenterNet?
What FPS can you get by using default yolov3.cfg/weights ?
Did you use pre-trained weights file for training CenterNet dla-34 512x512 ?

laclouis5 commented 4 years ago

@AlexeyAB,

Ubuntu 18.04.3 TLS Intel i7-7700 @ 3.6GHz x 8 GeForce GTX 1060 6GB Cuda 10.0 CuDNN 7.6

I got 19 FPS with original Yolo v3 network (544x544), 25 FPS in 512x512. I got 22 FPS with Yolo V3 CSR Spp Panet 512x512.

I used pre-trained weights from ctdet-coco-dla-2x.pth for CenterNet.

AlexeyAB commented 4 years ago

Model	Network Resolution	GTX 1060 FPS	GTX 1080Ti FPS	AP@.5	AP@.75	AP
CenterNet dla-34	512x512	25	~25	55.1%	40.8%	37.4%
CenterNet ResNet101	512x512	-	45	53.0%	36.9%	34.6%
csresnext50-panet-spp original-optimal.cfg	512x512	22	44	64.4%	45.9%	42.4%
yolov3.cfg	512x512	25	30	~56.0%	~33.0%	~32.0%

reactivetype commented 4 years ago

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS AP@.5 AP@.75 AP CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4% CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6% csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4% yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

I think the benchmark between yolov3 and centernet-darknet53 would be interesting.

laclouis5 commented 4 years ago

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS AP@.5 AP@.75 AP CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4% CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6% csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4% yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

What is interesting is that Yolo V3 is 1% better in mAP@0.5 than CenterNet dla-34 but far worse in mAP@0.75 and Coco AP (~8% and ~6%).

I noticed the same behaviour on my tests https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673, Center-Net dla-34 has 75.7% mAP@0.5 and 41.6% Coco AP. While this mAP@0.5 is the smallest of my trained networks, Coco AP is on of the best (between Yolo V3 Tiny Pan Mixup (40%) and Yolo V3 Tiny Pan3 (42%)).

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for mAP@0.5 but when looking at mAP@0.75 Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

reactivetype commented 4 years ago

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

Would you mind sharing a reference to CSR50-Panet?

AlexeyAB commented 4 years ago

@reactivetype

csresnext50-panet-spp-original-optimal.cfg https://github.com/AlexeyAB/darknet#pre-trained-models

reactivetype commented 4 years ago

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for mAP@0.5 but when looking at mAP@0.75 Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

@laclouis5 When comparing precision/recall of two detection architectures, it would be a fair comparison if we compare Centernet and Yolo with the same backbone. I suspect the dla-34 backbone may not be efficient and optimal.

In fact, it would be possible to use csresnext50 with Centernet. The good thing about centernet is that it's anchor-free and NMS is optional making post-processing a lot lighter. It maintains precision by enriching the supervision labels. An interesting variant of Centernet is TTFNet, which makes training even faster with better labels. https://arxiv.org/abs/1909.00700

AlexeyAB commented 4 years ago

@reactivetype

Why anchor-free is good?
What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors
MatrixNet uses soft-NMS

https://arxiv.org/pdf/1908.04646v2.pdf

https://github.com/AlexeyAB/darknet/issues/3772

KP-xNet solves problem (1) of CornerNets because all the matrix layers represent different scales and aspect ratios rather than having them all in a single layer. This also allows us to get rid of the corner pooling operation.

63209321-e8f72480-c0e7-11e9-8ab5-b75702dfcd29

reactivetype commented 4 years ago

Why anchor-free is good?

What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors

MatrixNet uses soft-NMS

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

AlexeyAB commented 4 years ago

@reactivetype

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

Execution time = 17.9 ms:

yolov3 model - 17.2 ms
3 x [yolo] layers - 0.3 ms
get_network_boxes - 0.4 ms
do_nms_sort - ~0.0 ms

https://github.com/AlexeyAB/darknet/issues/4497

The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

Yes, there is no fair comparison of accuracy / speed.

MatrixNet + ResNext101-X is better than CenterNet + very heavy HourGlass-104:

CenterNet: https://github.com/xingyizhou/CenterNet#object-detection-on-coco-validation

keko950 commented 4 years ago

Hi @AlexeyAB , how are filters calculated in the centernet cfg?

AlexeyAB commented 4 years ago

@keko950 Hi, As usual for [Gaussian_yolo] layer

filters=(classes + coords + 1)*<number of mask> = (classes + 9)*4

keko950 commented 4 years ago

@AlexeyAB Hmmm.. the cfg is wrong then?

[convolutional] size=1 stride=1 pad=1 filters=40 activation=linear

[Gaussian_yolo] yolo_point=right_bottom mask = 8,9,10,11 anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 59,119, 80,80, 116,90, 156,198, 373,326 classes=1 num=12 jitter=.3 ignore_thresh = .7 truth_thresh = 1 iou_thresh=0.213 iou_normalizer=0.5 uc_normalizer=0.5 cls_normalizer=1.0 iou_loss=mse scale_x_y = 1.1 random=0

AlexeyAB commented 4 years ago

cfg-file is correct.

Oh yes, as usuall for [Gaussian_yolo], I fixed previous answer.

https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

when using [Gaussian_yolo] layers, change [filters=57] filters=(classes + 9)x3 in the 3 [convolutional] before each [Gaussian_yolo] layer

keko950 commented 4 years ago

Nice, thank you for your time!

AlexeyAB / darknet