AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.57k stars 7.95k forks source link

efficient depthwise convolutions #4914

Open LukeAI opened 4 years ago

LukeAI commented 4 years ago

Is anybody aware of any hardware that can efficiently train/infer depthwise convolutions? (Other than full datacentre TPUs) Is it in the pipeline for Nvidia GPUs? It really seems like the way forward for DNNs but it's practically quite limited at the moment to small cpu networks.

AlexeyAB commented 4 years ago

As I uderstand Depth-wise convolutional isn't used in production even on TPU / TPU-edge: https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html

When performing the architecture search described above, one must consider that EfficientNets rely primarily on depthwise-separable convolutions, a type of neural network block that factorizes a regular convolution to reduce the number of parameters as well as the amount of computations. However, for certain configurations, a regular convolution utilizes the Edge TPU architecture more efficiently and executes faster , despite the much larger amount of compute.