NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

CuDNNConvolutionLayerTest faild at random #519

Closed twmht closed 6 years ago

twmht commented 6 years ago

Hi,

I found that CuDNNConvolutionLayerTest /TestGradientCuDNN failed at random with double type.

Here is the log

Note: Google Test filter = CuDNNConvolutionLayerTest/*.TestGradientCuDNN
Note: Randomizing tests' orders with a seed of 21590 .
[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from CuDNNConvolutionLayerTest/1, where TypeParam = double
[ RUN      ] CuDNNConvolutionLayerTest/1.TestGradientCuDNN
./include/caffe/test/test_gradient_check_util.hpp:169: Failure
The difference between computed_gradient and estimated_gradient is 0.25806093215942383, which exceeds threshold_ * scale, where
computed_gradient evaluates to -3.4281129837036133,
estimated_gradient evaluates to -3.6861739158630371, and
threshold_ * scale evaluates to 0.036861740052700043.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0, 0; feat = 0.025121258571743965; objective+ = -8.9507465362548828; objective- = -8.7295761108398438; stepsize_ = 0.029999999329447746
[  FAILED  ] CuDNNConvolutionLayerTest/1.TestGradientCuDNN, where TypeParam = double (2242 ms)
[----------] 1 test from CuDNNConvolutionLayerTest/1 (2242 ms total)

[----------] 1 test from CuDNNConvolutionLayerTest/0, where TypeParam = float
[ RUN      ] CuDNNConvolutionLayerTest/0.TestGradientCuDNN
[       OK ] CuDNNConvolutionLayerTest/0.TestGradientCuDNN (1338 ms)
[----------] 1 test from CuDNNConvolutionLayerTest/0 (1338 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (3580 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CuDNNConvolutionLayerTest/1.TestGradientCuDNN, where TypeParam = double

Environment: cudnn8 + cuda6 + GTX 970

Any idea?

drnikolaev commented 6 years ago

@twmht thanks. Yes, in some cases double fails but it's hard to reproduce. I assume you mean cuda8 and cudnn6, right? Please try to upgrade.