AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

cudnn_half wrong detections #2040

Closed salbatron closed 5 years ago

salbatron commented 5 years ago

CUDNN_HALF = 1 detection on version fb1ee79 image

am I missing something?

AlexeyAB commented 5 years ago

@salbatron

  1. What command do you use?
  2. What cfg-file do you use?
  3. What CUDA and cuDNN version do you use?
  4. What GPU do you use? Can you show output of nvidia-smi command?
  5. Do you use Linux or Windows?
salbatron commented 5 years ago

hi @AlexeyAB

  1. darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 dog.jpg -ext_output
  2. cfg is standard yolov3.cfg
  3. CUDA10.0, cuDNN 7.3
  4. RTX 2080
  5. Windows

smi: image

BackT0TheFuture commented 5 years ago

@AlexeyAB same problem on windows , the code is latest .

AlexeyAB commented 5 years ago

@salbatron @goodtogood Thanks!

Can you successfully detect using this commit Commits on Nov 26, 2018? https://github.com/AlexeyAB/darknet/tree/21a4ec9390b61c0baa7ef72e72e59fa143daba4c to download: https://github.com/AlexeyAB/darknet/archive/21a4ec9390b61c0baa7ef72e72e59fa143daba4c.zip


Also try to download the newset cuDNN v7.4.1 (Nov 8, 2018), for CUDA 10.0: https://developer.nvidia.com/rdp/form/cudnn-download-survey

salbatron commented 5 years ago

Yes, with older commits detection works just fine.

cuDNN v7.4.1 didn't resolved the problem

AlexeyAB commented 5 years ago

@salbatron I added 2 fixes. Try to update your code from GitHub and recompile, does it help?

salbatron commented 5 years ago

@AlexeyAB Detection is all right after your fix but I noticed that detection time increased too. Now detection time i same with CUDNN_HALF = 0 and CUDNN_HALF = 1

AlexeyAB commented 5 years ago

@salbatron try to un-comment these 2 lines:

salbatron commented 5 years ago

Detection time decreased but have no detection with CUDNN_HALF = 0

AlexeyAB commented 5 years ago

@salbatron Thanks!

Try to change these 2 lines in these 2 places:

To this line: if (state.index != 0 && state.net.cudnn_half && !l.xnor && (!state.train || iteration_num > 3*state.net.burn_in))

salbatron commented 5 years ago

That worked, thank you 👍 RTX 2080: CUDNN_HALF = 0 detection time ~20.5 ms CUDNN_HALF = 1 detection time ~19 ms

With wrong detection from start ~17.5 ms😄

AlexeyAB commented 5 years ago

@salbatron Thank you very much for your tests! )

There is small speedup

  1. may be because for the first detection there will be re-allocated some arrays - I just fixed it. Earlier you could try to compile with CUDNN_HALF=1 and run this command once darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 -ext_output and enter dog.jpg several times

  2. or may be because for the new cuDNN >= 7.2 there is used Tensor Cores (FP16) even when CUDNN_HALF=0, it is used automatically inside cuDNN library. Try to comment this line and compile with CUDNN_HALF=0 https://github.com/AlexeyAB/darknet/blob/25f133d6efcdc1d2aa79263e4fe4880ca782c79e/src/convolutional_layer.c#L199

May be difference between CUDNN_HALF=0 and CUDNN_HALF=1 will be more.

AlexeyAB commented 5 years ago

@salbatron I added another one fix, for small speedup when CUDNN_HALF=1, update your code from GitHub.

BackT0TheFuture commented 5 years ago

@AlexeyAB It works now. It performs slightly better (fps 20 VS 23.6) than CUDNN_HALF=0 when CUDNN_HALF=1 on windows. the model is yolov3_spp. Is it normal ?

salbatron commented 5 years ago

@AlexeyAB now the detection time is ~17.5ms with CUDNN_HALF = 1 and detections are good. Thank you very much

@goodtogood I find yolov3_spp a slower then standard yolov3. Try with yolov3

BackT0TheFuture commented 5 years ago

@salbatron I also noticed that, but I just wanna see how much improvement we can get when enabled CUDNN_HALF.

salbatron commented 5 years ago

@goodtogood I got ~27 ms with CUDNN_HALF = 1 and ~36 ms with CUDNN_HALF = 0, with yolov3-spp on RTX 2080

AlexeyAB commented 5 years ago

@goodtogood What GPU did you use?

It performs slightly better (fps 20 VS 23.6) than CUDNN_HALF=0 when CUDNN_HALF=1 on windows. the model is yolov3_spp. Is it normal ?

BackT0TheFuture commented 5 years ago

@salbatron thanks for your info.

@AlexeyAB mine is a cheaper one RTX 2070.