AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.71k stars 7.96k forks source link

Some thing about depthwise #1684

Open anqingjianke opened 6 years ago

anqingjianke commented 6 years ago

AlexeyAB, I am very glad to use your project. It helps me a lot. Thanks a lot. I have some question about "Darknet-YOLO" and want to seek advices from you. Of course I welcome any other guys to discuss.

After I read paper, learn you project and "pjreddie/darknet" project. Initially, I think if the number of categories required is not very large, Darknet 53 will be too complex. So I think introduce Mobilenet V2(GOOGLE) is a good idea. It can not only simplified model but also improve speed.

I find hjimce had made similar project, https://github.com/hjimce/darknet_mobilenet but her/his project is based on YOLO-V2. So I try to mingle hjimce' files, "depthwise_convolutional_kernels.cu", "depthwise_convolutional_layer.c" and "depthwise_convolutional_layer.h", into your project and pjreddie's project.

For pjreddie's project, https://github.com/pjreddie/darknet it can be compiled.

c1 c3

I trained a model and test it. I found its calculation amount is very small, 14.5 BFlops, and testing speed is high.

c2

However, the detector can recognize nothing, but no abnormal things happened during training process. I also tried use some depthwise convolution layers to replace ordinary convolution layers. This time it work.

For your project, due to I am unfamiliar with CUDA code, I only change variables to make to projects match. Sure enough, project can be compiled but can not train.

c4

gdb shows error position:

c5

My GPU is Titan XP 12G, Cuda version: 8.0 CUDNN version: 5.1

In conclusion, I think my issues/opinions are as follow:

  1. I am sorry about that I know less about CUDNN. So, could you give me some advises about the reason core dump?
  2. I find there are many different between your code and pjreddie's code, especially functions and parameters. For instance, your use "gradient_array_ongpu" function in ".cuda" file while pjreddie use "gradient_array_gpu" function. Could you show me the difference?
  3. I think Mobilenet-V2 is a good structure for model simplify. Do you have plan to integrate this module in your project?

Thank you!

AlexeyAB commented 6 years ago

@anqingjianke Hi,

  1. There isn't implemented depthwise-convolutional layers in this repository. So it can lead to errors - core dump.

  2. gradient_array_ongpu and gradient_array_gpu just different names, but the same code:

  3. May be I will add depthwise-convolutional layers later (Mobilenet-V2). Currently I think about quantization (float-32bit, float-16bit/Tensor-cores, int-8, bit-1, ...)

Why do you want to use Mobilenet-V2? Is it not enought speed of yolov3-tiny.cfg? It should be more than 200 FPS on Titan XP (GP102-450-A1) 11 Tflops-sp. Or do you want to use it on the CPU?

anqingjianke commented 6 years ago

@AlexeyAB

Thanks for your detailed reply.

Yes, I want to use it on CPU, especially on laptop low voltage CPU. I think depthwise structure can lightweight network more effective. So I want to try Mobilenet-V2 as backbone net.

Looking forward to your new work.

aimhabo commented 5 years ago

I also tried to put hjimce's depthwise_conv into your version. But when iterating to the 2rd iter, it will be all NaN. 2019-01-04 13-09-18 2019-01-04 13-09-33

and while I run depthwise_convolutional_layer.c::test_depthwise_convolutional_layer() direactly:

DWconv 3 3 x 3 / 1 5 x 5 x 3 -> 3 x 3 x 3 0.000 BF avg 3 x 3 x 3 -> 3 softmax 3 cost 3 **cost:0.825286 * **backward * predicting. predict 0.000000 Error in `/home/wit/darknet/darknet': corrupted double-linked list: 0x000000000133d5e0 ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb3636797e5] ... /usr/local/cuda/lib64/libcudnn.so.5(+0x3ab36)[0x7fb363c1cb36] ======= Memory map: ======== 00400000-00550000 r-xp 00000000 08:06 11629442
...

I found that https://github.com/hjimce/darknet_mobilenet/blob/1ef567672110492fab2075851d0850f42feac67c/src/network.c#L192 use 'fill_cpu' and pjreddie use 'fill_cpu' too https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/network.c#L202 . In your version is 'scalcpu' https://github.com/AlexeyAB/darknet/blob/527578744b46666fb5cd42393bf9e1fa9af126ee/src/network.c#L201. Would it be the most fundamental difference? I thought you have 'fill' it any where either. so where could we add 'fill' in 'depthwise's?