facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.95k forks source link

Docker (GPU support, CUDA 8.0, cuDNN 7) test failed #1431

Open hiepph opened 6 years ago

hiepph commented 6 years ago

Follow installation with nvidia-docker at tutorial, after pulling latest image (GPU support, CUDA 8.0, cuDNN 7) I ran test:

nvidia-docker run -it caffe2ai/caffe2:latest python -m caffe2.python.operator_test.relu_op_test

But error came in:

Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([-0.98402572, -0.10023266,  0.5741834 , -0.5871824 ], dtype=float32), gc=, dc=[, device_type: 1], engine=u'CUDNN')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
Failure in checking device option 1 and output Y. The outputs are:
[ 0.]
[ 0.02]
0.02
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 523, in evaluate_test_data
    self.search_strategy, self.test,
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 115, in run
    return test(*args, **kwargs)
  File "/usr/local/caffe2/python/operator_test/relu_op_test.py", line 26, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "/usr/local/caffe2/python/hypothesis_test_util.py", line 325, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
  File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
Failure in checking device option 1 and output Y. The outputs are:
[ 0.]
[ 0.02]
0.02
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 523, in evaluate_test_data
    self.search_strategy, self.test,
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 115, in run
    return test(*args, **kwargs)
  File "/usr/local/caffe2/python/operator_test/relu_op_test.py", line 26, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "/usr/local/caffe2/python/hypothesis_test_util.py", line 325, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
  File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Failure in checking device option 1 and output Y. The outputs are:
[ 0.]
[ 0.02]
0.02
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 523, in evaluate_test_data
    self.search_strategy, self.test,
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 115, in run
    return test(*args, **kwargs)
  File "/usr/local/caffe2/python/operator_test/relu_op_test.py", line 26, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "/usr/local/caffe2/python/hypothesis_test_util.py", line 325, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
  File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

Falsifying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Failure in checking device option 1 and output Y. The outputs are:
[ 0.]
[ 0.02]
0.02
F
======================================================================
FAIL: test_relu (__main__.TestRelu)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/caffe2/python/operator_test/relu_op_test.py", line 19, in test_relu
    engine=st.sampled_from(["", "CUDNN"]),
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 721, in wrapped_test
    state.run()
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 614, in run
    print_example=True, is_final=True
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/usr/local/lib/python2.7/dist-packages/hypothesis/core.py", line 115, in run
    return test(*args, **kwargs)
  File "/usr/local/caffe2/python/operator_test/relu_op_test.py", line 26, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "/usr/local/caffe2/python/hypothesis_test_util.py", line 325, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 6.212s

Ignore test, try to run char_rnn.py example with:

nvidia-docker run -it caffe2ai/caffe2:latest /bin/bash

# after getting data & main py file
python char_rnn.py --train_data shakespeare.txt --gpu

Error as expected:

Input has 62 characters. Total input size: 99993
DEBUG:char_rnn:Start training
DEBUG:char_rnn:Training model
Traceback for operator 10 in network char_rnn
Entering interactive debugger. Type "bt" to print the full stacktrace. Type "help" to see command listing.
[enforce fail at reshape_op.h:96] total_size == size. 1550 vs 0. Argument `shape` does not agree with the input data. (1550 != 0) Error from operator: 
input: "softmax_reshaped_grad" input: "_" output: "softmax_grad" output: "_softmax_grad_dims" name: "" type: "Reshape" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true
> /usr/local/caffe2/python/workspace.py(166)CallWithExceptionIntercept()
-> return func(*args, **kwargs)
(Pdb) bt
  /usr/local/caffe2/python/utils.py(258)run()
-> return func()
  /usr/local/caffe2/python/utils.py(290)func()
-> return f(*args, **kwargs)
  /home/char_rnn.py(286)main()
-> model.TrainModel()
  /home/char_rnn.py(197)TrainModel()
-> workspace.RunNet(self.model.net.Name())
  /usr/local/caffe2/python/workspace.py(201)RunNet()
-> StringifyNetName(name), num_iter, allow_fail,
> /usr/local/caffe2/python/workspace.py(166)CallWithExceptionIntercept()
-> return func(*args, **kwargs)

What went wrong?

YanWang2014 commented 6 years ago

Same problem here, any solution? Is it related to the GPU, I am testing it on GT750M