apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

test_operator_gpu.test_kernel_error_checking Fails #16353

Open ChaiBapchya opened 4 years ago

ChaiBapchya commented 4 years ago

UNIX-GPU Unrelated PR #16328 http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16328/2/pipeline

test_operator_gpu.test_kernel_error_checking ... Process SpawnProcess-5:

Traceback (most recent call last):

  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap

    self.run()

  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run

    self._target(*self._args, **self._kwargs)

  File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 2206, in kernel_error_check_imperative

    c = (a / b).asnumpy()

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/ndarray/ndarray.py", line 331, in __truediv__

    return divide(self, other)

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/ndarray/ndarray.py", line 3673, in divide

    _internal._rdiv_scalar)

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/ndarray/ndarray.py", line 3429, in _ufunc_helper

    return fn_array(lhs, rhs)

  File "<string>", line 52, in broadcast_div

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/_ctypes/ndarray.py", line 107, in _imperative_invoke

    ctypes.byref(out_stypes)))

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/base.py", line 254, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [00:50:08] src/operator/contrib/tvmop/../../tensor/elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [3] [0]

Stack trace:

  [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7faf5d982ad2]

  [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::BinaryBroadcastShape(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x3e9) [0x7faf5d9860c9]

  [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xe8a) [0x7faf606c77ba]

  [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x35c) [0x7faf606ce65c]

  [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeImpl(void*, int, void**, int*, void***, int, char const**, char const**)+0xb33) [0x7faf60e72c33]

  [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeEx+0x534) [0x7faf60e747a4]

  [bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7fafd0404e20]

  [bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7fafd040488b]

  [bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7fafd03ff01a]

Process SpawnProcess-6:

Traceback (most recent call last):

  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap

    self.run()

  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run

    self._target(*self._args, **self._kwargs)

  File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 2215, in kernel_error_check_symbolic

    'b':mx.nd.array([],ctx=mx.gpu(0))})

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/symbol/symbol.py", line 1914, in bind

    ctypes.byref(handle)))

  File "/work/mxnet/tests/python/unittest/../../../python/mxnet/base.py", line 254, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: Error in operator _div0: [00:50:12] src/operator/contrib/tvmop/../../tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node _div0 at 1-th input: expected [3], got [0]

Stack trace:

  [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f03ae617ad2]

  [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(bool mxnet::op::ElemwiseAttr<mxnet::TShape, &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, &mxnet::op::shape_string[abi:cxx11], -1, -1>(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape const&)::{lambda(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > const&, unsigned long, char const*)#1}::operator()(std::vector<mxnet::TShape, std::allocator<mxnet::TShape> > const&, unsigned long, char const*) const+0xb92) [0x7f03ae620c72]

  [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(bool mxnet::op::ElemwiseAttr<mxnet::TShape, &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, &mxnet::op::shape_string[abi:cxx11], -1, -1>(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, mxnet::TShape const&)+0x1d4) [0x7f03ae622224]

  [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(bool mxnet::op::ElemwiseShape<2, 1>(nnvm::NodeAttrs const&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*)+0x105d) [0x7f03aee034cd]

  [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x49db7eb) [0x7f03b1b6b7eb]

  [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::exec::InferShape(nnvm::Graph&&, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1a5e) [0x7f03b1b6dcfe]

  [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::Context, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::Context> > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x11eb) [0x7f03b1b8540b]

  [bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Executor::Bind(nnvm::Symbol, mxnet::Context const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::Context, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::Context> > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, mxnet::Executor*)+0x21b) [0x7f03b1b917ab]

  [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorBindEX+0xd57) [0x7f03b1b169e7]

ok (7.9246s)
mxnet-label-bot commented 4 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): Test

ChaiBapchya commented 4 years ago

@mxnet-label-bot add [Test]

ChaiBapchya commented 4 years ago

18785

Build 14 & 15 https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-18785/15/pipeline https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-18785/14/pipeline