after merge bn+scale into conv, different outputs with same input , in gpu mode

ubuntu 18.04 cuda 10.0.130 cudnn 7.5.1 python 3.6.9 model backbone is mobilefacenet，and dw layer is implement by Convolution with group, and dw layer has convolution_param/engine: CAFFE .

caffe version is MobileNet-YOLO (https://github.com/eric612/MobileNet-YOLO). after merge bn+scale into conv, test model with the same input for 100 times: 1、in cpu mode, output is same. 2、in gpu mode, output is different. 3、in gpu mode, set all convolution_param/engine: CAFFE, output is same.

caffe version is BVLC/caffe, run the merged model with gpu： 1、set convolution_param/engine: CAFFE for dw layer, it works normal. 2、commen convolution_param/engine: CAFFE for dw layer, it can not work, and got the error "Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR".

BVLC / caffe

after merge bn+scale into conv, different outputs with same input , in gpu mode #6991