ThunderNet - 40.2% mAP@0.5 - 5-25 FPS on Snapdragon 845 (ARM)

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.77k stars 7.96k forks source link

ThunderNet - 40.2% mAP@0.5 - 5-25 FPS on Snapdragon 845 (ARM) #3702

Open AlexeyAB opened 5 years ago

AlexeyAB commented 5 years ago

ThunderNet: Towards Real-time Generic Object Detection: https://arxiv.org/abs/1903.11752v2

Without bells and whistles, our model runs at 24.1 fps on an ARM-based device.

CEM (Context Enhancement Module) and SAM (Spatial Attention Module) of ThunderNet.

CEM + YOLOv3 got 41.2% mAP@0.5 with 2.85 BFLOPs.
CEM + SAM + YOLOv3 got 42.0% mAP@0.5 with 2.90 BFLOPs.

CEM:

SAM:

Results:

LukeAI commented 5 years ago

This'd probably be great for my use-case actually. I'm looking for the most accurate object detector that can run around 150+ FPS on 2080Ti. currently looking at using pan2_swish_scale or something along those lines but maybe Thundernet is the answer :)

AlexeyAB commented 5 years ago

@LukeAI It seems that Thundernet is fast on CPU, but on on GPU. Just like an EfficientNet as you know.

glenn-jocher commented 5 years ago

@LukeAI @AlexeyAB hi guys, just noticed this post. I'm not familiar with Thundernet, but I believe we've already beat their results with YOLOv3 on Apple's A12 asic.

iDetection iOS app runs YOLOv3-SPP 320 (52.4 mAP @0.5) inference on vertical 4k video at 24+ FPS on iPhone Xs, and 30+ FPS on iPad pro, and anyone can download it for free and test it out themselves in seconds (repeatable results).

glenn-jocher commented 5 years ago

BTW to add some more details, this is the full yolov3-spp.weights model (not the tiny version), trained with darknet, exported to PyTorch > ONNX > CoreML. The screenshots you see on the link are a bit outdated (we need to shoot some new ones), and show the FPS before we optimized the export pipeline for speed. A current screenshot looks like this. The FPS constantly fluctuates of course, so this screenshot happened to capture 18 FPS, but we typically see a steady state 24 FPS on iPhone Xs for at least the first 5-10 minutes, before overheating starts to become an issue and the iPhone begins throttling the neural engine. With no external cooling, after one hour of continuous running the FPS will drop to about 5-10 FPS.

BUT there's a new A13 asic coming out next month on iPhone 11, so I'm hoping this will finally push inference to 30 FPS and possibly reduce the thermal issue a bit too!! :)

mmaaz60 commented 4 years ago

Is there any official implementation available of ThunderNet where we can reproduce the results mentioned in the paper? Thanks