Comparison of some models on CPU vs VPU (neurochip) vs GPU

AlexeyAB commented 4 years ago

batch=1 (sync-mode)
CPU, VPU
- OpenCV 4.2.0 (master-branch 21 Mar 2020)
- OpenVINO 2020.1.033
GPU
- CUDA 10.0
- cuDNN 7.4.2
- Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

Accuracy and FPS:

Model	AP50...95 (MSCOCO), accuracy	mAP50 (MSCOCO), accuracy	CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS	VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS	GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS
yolov4-tiny 416x416		40.2%	-	-	330
yolov3-tiny 416x416		33.1%	35	6.5	340
yolov3-tiny-PRN 416x416		33.1%	46	5.3	370
EfficientNetB0-Yolo 416x416		45.5%	11	-	55
yolov3 416x416	31.0%	55.3%	-	-	-
yolov3-spp 512x512		~59.6%	3.3	1.1	52
csresnext50-opt 512x512	42.4%	64.4%	3.5	0.64	37
csdarknet53-opt 256x256 async=3	33.3%	53.0%	14	11	74
csdarknet53-opt 512x512	42.4%	64.5%	3.5	1.23	50
csdarknet53-mish 512x512 (YOLOv4)	43.0%	64.9%	-	-	50
csresnext50-opt 608x608	43.2%	65.4%	-	-	34
csdarknet53-mish 608x608 (YOLOv4)	43.5%	65.7%	-	-	37

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

So currently EfficientNetB0-Yolo is the fastest model on VPU?

AlexeyAB commented 4 years ago

@WongKinYiu Hi,

Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.

Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.

So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet (as a new tiny-yolo model): https://github.com/AlexeyAB/darknet/issues/4418#issue-530577441

WongKinYiu commented 4 years ago

@AlexeyAB Thanks,

ghostnet now training 40k/800k iterations.

AlexeyAB commented 4 years ago

@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?

Also did we get improvement for any network with DropBlock?

LukeAI commented 4 years ago

This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.

WongKinYiu commented 4 years ago

@AlexeyAB No, just ghostnet.cfg.txt your provided before.

AlexeyAB commented 4 years ago

@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6 or 4

WongKinYiu commented 4 years ago

@AlexeyAB thanks, i am seeing the code of new commits.

WongKinYiu commented 4 years ago

@AlexeyAB i set subdivisions=4 and the training is start now.

ShaneHsieh commented 4 years ago

Hi @AlexeyAB When you test CPU, VPU , do you use FP32? As far as I know, VPU can use FP16 and Int8. this information is very important.

AlexeyAB commented 4 years ago

@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values with increasing speed and without loss of accuracy.

ShaneHsieh commented 4 years ago

Thank. Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.

andeyeluguo commented 4 years ago

what does the opencv-DLIE mean?

WongKinYiu commented 4 years ago

OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.

WongKinYiu commented 4 years ago

Yes, you can use opencv dnn module to run the models. For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...

But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.

andeyeluguo commented 4 years ago

Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.

WongKinYiu commented 4 years ago

for your reference https://github.com/opencv/opencv/pull/16436

andeyeluguo commented 4 years ago

will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site Does OpenCV-OpenVINO version supports Yolo v3 network? It may be asked by alexeyAB.

WongKinYiu commented 4 years ago

Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

AlexeyAB commented 4 years ago

@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should

install OpenVINO as usual
install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

YOLOv4 will be supported for OpenCV+OpenVINO soon: https://github.com/opencv/opencv/issues/17148

I added Yolo v2 to OpenCV 2.5 years ago: https://github.com/opencv/opencv/pull/9705

mmaaz60 commented 4 years ago

Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?

Luxonis-Brandon commented 4 years ago

@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.

Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.

AlexeyAB commented 4 years ago

@Luxonis-Brandon

Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.

modern_gpus

There are two ways to run YOLOv4 on MyriadX:

Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: https://github.com/opencv/opencv/issues/17148

Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:

or (width=512 height=512 in cfg with accuracy 42.4% AP and speed 1.2 FPS) look at the table https://github.com/AlexeyAB/darknet/issues/5079#issue-585403577
or (width=320 height=320 in cfg 40.5% AP and 3 FPS)
or (width=320 height=320 in cfg 40.5% AP and ~7 FPS with async=3 streams)

use

cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
example: https://github.com/opencv/open_model_zoo/tree/master/demos/object_detection_demo_yolov3_async
1. just change anchors https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L118-L119 to these values: https://github.com/AlexeyAB/darknet/blob/36c73c5b9e3f2e72049fb68566e32632f6c70e85/cfg/yolov4.cfg#L1141
2. instead of this code: https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L196-L197 use this code
```
// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
    double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
    double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;
```

AlexeyAB commented 4 years ago

@Luxonis-Brandon

I just tested csdarknet53-opt (YOLOv4 without MISH in cfg set: width=256 height=256 - 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.

AlexeyAB commented 4 years ago

OpenCV_Vs_TensorRT

ausk commented 4 years ago

OpenCV 4.4.0-pre compiled by self. OpenVino 2020.R3, Myriad. net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

Input 416x416

efficient-b0 395 ms yolov3, 550 ms yolov3-tiny-prn, 168 ms yolov3-tiny, 128 ms yolov4, 940 ms efnet-coco, 395 ms

AlexeyAB commented 4 years ago

YOLOv4-tiny released: https://github.com/AlexeyAB/darknet/issues/6067

linyib commented 6 months ago

Hi, Who has efficientnet-lite3.weights file, can you share it with me?

AlexeyAB / darknet

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079