Closed salbatron closed 5 years ago
@salbatron
nvidia-smi
command?hi @AlexeyAB
smi:
@AlexeyAB same problem on windows , the code is latest .
@salbatron @goodtogood Thanks!
Can you successfully detect using this commit Commits on Nov 26, 2018
? https://github.com/AlexeyAB/darknet/tree/21a4ec9390b61c0baa7ef72e72e59fa143daba4c
to download: https://github.com/AlexeyAB/darknet/archive/21a4ec9390b61c0baa7ef72e72e59fa143daba4c.zip
Also try to download the newset cuDNN v7.4.1 (Nov 8, 2018), for CUDA 10.0: https://developer.nvidia.com/rdp/form/cudnn-download-survey
Yes, with older commits detection works just fine.
cuDNN v7.4.1 didn't resolved the problem
@salbatron I added 2 fixes. Try to update your code from GitHub and recompile, does it help?
@AlexeyAB Detection is all right after your fix but I noticed that detection time increased too. Now detection time i same with CUDNN_HALF = 0 and CUDNN_HALF = 1
Detection time decreased but have no detection with CUDNN_HALF = 0
@salbatron Thanks!
Try to change these 2 lines in these 2 places:
To this line:
if (state.index != 0 && state.net.cudnn_half && !l.xnor && (!state.train || iteration_num > 3*state.net.burn_in))
That worked, thank you 👍 RTX 2080: CUDNN_HALF = 0 detection time ~20.5 ms CUDNN_HALF = 1 detection time ~19 ms
With wrong detection from start ~17.5 ms😄
@salbatron Thank you very much for your tests! )
There is small speedup
may be because for the first detection there will be re-allocated some arrays - I just fixed it.
Earlier you could try to compile with CUDNN_HALF=1
and run this command once darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 -ext_output
and enter dog.jpg
several times
or may be because for the new cuDNN >= 7.2 there is used Tensor Cores (FP16) even when CUDNN_HALF=0
, it is used automatically inside cuDNN library.
Try to comment this line and compile with CUDNN_HALF=0
https://github.com/AlexeyAB/darknet/blob/25f133d6efcdc1d2aa79263e4fe4880ca782c79e/src/convolutional_layer.c#L199
May be difference between CUDNN_HALF=0
and CUDNN_HALF=1
will be more.
@salbatron I added another one fix, for small speedup when CUDNN_HALF=1
, update your code from GitHub.
@AlexeyAB It works now. It performs slightly better (fps 20 VS 23.6) than CUDNN_HALF=0 when CUDNN_HALF=1 on windows. the model is yolov3_spp. Is it normal ?
@AlexeyAB now the detection time is ~17.5ms with CUDNN_HALF = 1 and detections are good. Thank you very much
@goodtogood I find yolov3_spp a slower then standard yolov3. Try with yolov3
@salbatron I also noticed that, but I just wanna see how much improvement we can get when enabled CUDNN_HALF.
@goodtogood I got ~27 ms with CUDNN_HALF = 1 and ~36 ms with CUDNN_HALF = 0, with yolov3-spp on RTX 2080
@goodtogood What GPU did you use?
It performs slightly better (fps 20 VS 23.6) than CUDNN_HALF=0 when CUDNN_HALF=1 on windows. the model is yolov3_spp. Is it normal ?
@salbatron thanks for your info.
@AlexeyAB mine is a cheaper one RTX 2070.
CUDNN_HALF = 1 detection on version fb1ee79
am I missing something?