Strange: cudnn5 make "average forward pass" slower?

What hardware and operating system/distribution are you running?

Operating system: CentOS Linux release 7.4.1708 CUDA version: cuda-8.0 CUDNN version: cudnn.so.5 openCV version: opencv2.4 BLAS: openblas Python version: Python3.4

when I compile the caffe switch cudnn5 setting "USE_CUDNN := 1" and "# USE_CUDNN := 1", then execute the following commands, get and strange results that cudnn5 make channel_pruning "average forward pass" slower, which prototxt and caffemodel download from the link you give. In order to give more information, I list the normal results and unnormal results in the first block and the second block respetively. As I am not familar with CUDA and cuDnn, this strange things really makes me confused, thanks for your some comments.

#caffe/build/tools/caffe time -model temp/vgg.prototxt -weights temp/vgg.caffemodel -iterations 10 -gpu  0 
# Average Forward pass: 242.853 ms.
# Average Backward pass: 713.927 ms.
# Average Forward-Backward: 956.982 ms.
##use cudnn5
# Average Forward pass: 198.034 ms.
# Average Backward pass: 356.98 ms.
# Average Forward-Backward: 555.226 ms.

#caffe/build/tools/caffe time -model temp/bn_vgg.prototxt   -weights temp/bn_vgg.caffemodel -iterations 10 -gpu 0
# Average Forward pass: 243.775 ms.
# Average Backward pass: 715.241 ms.
# Average Forward-Backward: 959.264 ms.
###use cudnn5
# Average Forward pass: 219.921 ms.
# Average Backward pass: 356.2 ms.
# Average Forward-Backward: 576.338 ms.

#caffe/build/tools/caffe time  -model temp/bak/cb_3c_3C4x_mem_bn_vgg.prototxt  -weights temp/bak/cb_3c_vgg.caffemodel  -iterations 10 -gpu 0 
# Average Forward pass: 174.96 ms.
# Average Backward pass: 1008.17 ms.
# Average Forward-Backward: 1183.51 ms
###use cudnn
# Average Forward pass: 122.147 ms.
# Average Backward pass: 245.928 ms.
# Average Forward-Backward: 368.314 ms

#after finetune
#caffe/build/tools/caffe time -model temp/bak/cb_3c_3C4x_mem_bn_vgg.prototxt  -weights temp/bak/models/3C4X_iter_200000.caffemodel -iterations 10 -gpu  0 
# Average Forward pass: 173.536 ms.
# Average Backward pass: 1004.33 ms.
# Average Forward-Backward: 1178.2 ms.
##use cudnn5
# Average Forward pass: 121.938 ms.
# Average Backward pass: 245.541 ms.
# Average Forward-Backward: 367.757 ms.

#caffe/build/tools/caffe time -model temp/channel_pruning.prototxt  -weights temp/channel_pruning.caffemodel -iterations 10 -gpu  0 
# Average Forward pass: 114.38 ms.
# Average Backward pass: 485.261 ms.
# Average Forward-Backward: 599.849 ms.
##use cudnn5
# Average Forward pass: 300.156 ms.
# Average Backward pass: 119.61 ms.
# Average Forward-Backward: 419.968 ms.

#caffe/build/tools/caffe time -model temp/channel_pruning_VGG-16_3C4x.prototxt  -weights temp/channel_pruning_VGG-16_3C4x.caffemodel  -iterations 10 -gpu 0
# Average Forward pass: 164.283 ms.
# Average Backward pass: 985.954 ms.
# Average Forward-Backward: 1150.5 ms.
##use  cudnn5
**# Average Forward pass: 302.349 ms.**
# Average Backward pass: 227.605 ms.
# Average Forward-Backward: 530.039 ms.

ethanhe42 / channel-pruning

Strange: cudnn5 make "average forward pass" slower? #67

What hardware and operating system/distribution are you running?