Open lukeyeager opened 7 years ago
1d convolution still fails with the same error.
3d convolution is possibly worse?
$ pytest -sv caffe2/python/operator_test/conv_test.py::TestConvolution::test_3d_convolution_nchw
Trying example: test_3d_convolution_nchw(self=<conv_test.TestConvolution testMethod=test_3d_convolution_nchw>, input_channels=2, output_channels=1, batch_size=1, stride=1, size=4, kernel=2, dilation=2, pad=0, u$e_bias=True, gc=<caffe2.proto.caffe2_pb2.DeviceOption at 0x7fcc28dfeb90>, dc=[<caffe2.proto.caffe2_pb2.DeviceOption at 0x7fcc28dfeb18>,
<caffe2.proto.caffe2_pb2.DeviceOption at 0x7fcc28dfeb90>])
F0830 23:07:05.518461 7324 context_gpu.cu:357] Error at: /caffe2/caffe2/core/context_gpu.cu:357: an illegal memory access was encountered
Aborted (core dumped)
@lukeyeager Its a bug in im2col_nd_gpu_kernel. I tried to track it down but I wasn't able to get cuda-gdb to work with caffe2 which makes debugging the device code very hard.
There are so many hypothesis configs for this test that it may take several tries before this error appears. Or you can increase the robustness of the test with
@hypothesis.settings(max_examples=50)
.