Tiny-Yolo OpenCL inference engine is available !!

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.68k stars 7.96k forks source link

Tiny-Yolo OpenCL inference engine is available !! #225

Open spedagadi opened 6 years ago

spedagadi commented 6 years ago

@AlexeyAB, thanks to your work on windows port of YOLO. I am quite excited to share that tiny-yolo inference engine has been ported to work on opencl hardware. My project evolved starting from your windows port of YOLO. Here it is https://github.com/sat8/YoloOCLInference.git . Would be great to hear any comments/feedback. Spread the word pls, thnx.

AlexeyAB commented 6 years ago

@sat8 Hi, great work! You can achive ~208 FPS on GTX 1080 Ti using OpenCL, and how many FPS can you achive using CUDA on GTX 1080 Ti?

Also it should be easy to implement the full version of the yolo:

reorg_layer is just some permutations of elements: https://github.com/AlexeyAB/yolo2_light/blob/f18a82b22b2381266cb839e94cdf7a9c0e6166b2/src/gpu.cu#L399
route_layer is just copy bottom layers one by one without any modifications: https://github.com/AlexeyAB/yolo2_light/blob/f18a82b22b2381266cb839e94cdf7a9c0e6166b2/src/yolov2_forward_network_gpu.cu#L60

This is a very truncated version of darknet, where only what belongs to the yolo is left - it is very easy to learn: https://github.com/AlexeyAB/yolo2_light/tree/master/src

CPU yolo inference in 1 file: yolov2_forward_network.c
GPU yolo inference in 2 files: yolov2_forward_network_gpu.cu, gpu.cu

spedagadi commented 6 years ago

@AlexeyAB

I have modified the detector.c file in darknet project to produce performance measure for 1000 iterations in sequence and below is the command line output.

One thing to notice is that there is a lot of variability 142,166,200 in fps values. I am not sure if this is due to the type of timer used to produce the inference duration. (I am using std::chrono where as detector.c in darknet uses native clock() ). It seems it is not so easy to get chrono working in detetcor.c (a bunch of compilation errors).

If the fps values are true, then my opencl port seems to generate higher processing speeds (see the output for opencl ported version. the detection outputs are identical in both cases).

Can you validate darknet and check my numbers from your side with tiny-yolo.cfg if you have some time? thnx.

spedagadi commented 6 years ago

Just an update. Both Linux & windows are now supported in my repo https://github.com/sat8/YoloOCLInference.git.

Another note: CUDA 9.0 is available and the opencl port somehow performs better in Linux with processing speeds of ~227 fps compared to Windows (~208 fps). Stay tuned for more updates...