apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Segmentation Fault in module mxnet.np with missmatch dimensions #16889

Closed gpolov-personal closed 4 years ago

gpolov-personal commented 4 years ago

Description

When using the module np from mxnet, any operation that in the numpy package raises an exception "Value Error that the operands could not be broadcast together with shapes", in the np module of mxnet a segmentation fault: 11 occurs.

Error Message

Segmentation fault: 11

Stack trace:
  [bt] (0) 1   libmxnet.so                         0x0000000116211740 mxnet::Storage::Get() + 7968
  [bt] (1) 2   libsystem_platform.dylib            0x00007fff5e90fb5d _sigtramp + 29
  [bt] (2) 3   ???                                 0x00007ffeebd20468 0x0 + 140732854830184
  [bt] (3) 4   libmxnet.so                         0x0000000113cb103e std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2542
  [bt] (4) 5   libmxnet.so                         0x0000000114b92041 mxnet::op::FullyConnectedComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 11458385
  [bt] (5) 6   libmxnet.so                         0x0000000114b9262f mxnet::op::FullyConnectedComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 11459903
  [bt] (6) 7   libmxnet.so                         0x0000000115b7de6f mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*) + 1583
  [bt] (7) 8   libmxnet.so                         0x0000000115b7c96c mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&) + 716
  [bt] (8) 9   libmxnet.so                         0x0000000115abf5ce SetNDInputsOutputs(nnvm::Op const*, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> >*, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> >*, int, void* const*, int*, int, int, void***) + 1582

To Reproduce

Create an environment, install python 3.7 and pip, install mxnet using pip (pip install mxnet==1.6.0b20190915, I have a Mac) and then directly in a python interactive session run any np instruction that would raise an exception in numpy (Value Error that the operands could not be broadcast together with shapes).

Steps to reproduce

With miniconda3 installed, execute the next commands:

  1. conda create --name mxnet_test
  2. conda activate mxnet_test
  3. conda install python=3.7 pip
  4. pip install mxnet==1.6.0b20190915
  5. python
  6. 
    >>> from mxnet import np, npx
    >>> npx.set_np()
    >>> B = np.arange(20).reshape(5,3)
## What have you tried to solve it?

1. I have installed in multiple environments, with and without the d2l-ai and d2l packages (starting the course)
2. Try it from a jupyter notebook and directly in the python interpreter
3.  Check that with numpy this doesn't happend

## Environment

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

paste outputs here

----------Python Info---------- Version : 3.7.5 Compiler : Clang 4.0.1 (tags/RELEASE_401/final) Build : ('default', 'Oct 25 2019 10:52:18') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.3.1 Directory : /Users/gonzalopolo/.conda/envs/mxnet_test/lib/python3.7/site-packages/pip ----------MXNet Info----------- Version : 1.6.0 Directory : /Users/gonzalopolo/.conda/envs/mxnet_test/lib/python3.7/site-packages/mxnet Num GPUs : 0 Commit Hash : e9e267ef711261f20528d443f38eb7b9e991057c ----------System Info---------- Platform : Darwin-18.7.0-x86_64-i386-64bit system : Darwin node : Gonzalos-MacBook-Pro.local release : 18.7.0 version : Darwin Kernel Version 18.7.0: Sat Oct 12 00:02:19 PDT 2019; root:xnu-4903.278.12~1/RELEASE_X86_64 ----------Hardware Info---------- machine : x86_64 processor : i386 b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz' b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C' b'machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT SGXLC MDCLEAR TSXFA IBRS STIBP L1DF SSBD' b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI' ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0424 sec, LOAD: 0.6233 sec. Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0005 sec, LOAD: 0.6125 sec. Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 1.0599 sec, LOAD: 0.2111 sec. Timing for D2L: http://d2l.ai, DNS: 0.0373 sec, LOAD: 0.3267 sec. Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0440 sec, LOAD: 0.4644 sec. Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0506 sec, LOAD: 0.8738 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0339 sec, LOAD: 0.5021 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0264 sec, LOAD: 0.2297 sec.

sxjscience commented 4 years ago

@GPoloVera I cannot reproduce it with the master version. Actually, I've checked the exception here in the test: https://github.com/apache/incubator-mxnet/pull/16180/files

reminisce commented 4 years ago

Can you try pip install mxnet --pre to install the latest wheel? It should have been fixed after 9/15.

gpolov-personal commented 4 years ago

@reminisce, that solved the problem, thank you very much!!

The thing is the version is still the 1.6.0. In the Dive Dive into Deep Learning course that uses mxnet they say to install mxnet this way

pip install mxnet==1.6.0b20190915

Is this a problem that only occurs sometimes? (in MAC environment for example) or something that it is a bug of the 0b20190915 "subversion" because it is pretty weird nobody has reported before

reminisce commented 4 years ago

@GPoloVera D2L will depend on MXNet v1.6.0 eventually. I encourage you to always use the latest nightly wheel to run D2L to get the latest feature and bug fixes. If you spot something broken, feel free to file issues so that we can still fix them before v1.6.0 is released.