apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.73k stars 6.81k forks source link

topk/mshadow::range returns wrong result #8303

Closed eric-haibin-lin closed 6 years ago

eric-haibin-lin commented 6 years ago

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues.

If the issue is non-technical, feel free to present the information in what you believe is the best form.

Description

(Brief description of the problem in no more than 2 sentences.)

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Package used (Python/R/Scala/Julia): python2 (I'm using ...)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash: ffa6e45aad4eeca8e6d27764789cd615d132fcb9 (Paste the output of git rev-parse HEAD here.)

Build config: (Paste the content of config.mk, or the build command.)

Error Message:

(Paste the complete error message, including stack trace.)

[22:41:18] /home/ubuntu/haibin-mxnet/dmlc-core/include/dmlc/./logging.h:308: [22:41:18] /home/ubuntu/haibin-mxnet/mshadow/mshadow/./tensor_cpu-inl.h:195: Check failed: eshape[0] ==
0 || eshape == dshape Assignment: Shape of Tensors are not consistent with target, eshape: (54686456,) dshape:(54686454,)

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3f) [0x7f717cd11b4f]
[bt] (1) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6MapExpINS_2sv6savetoENS_6TensorINS_3cpuELi1EiEELi1EiNS_4expr8RangeExpIiEELi1EEEvPNS_7TRValueIT0_S4$
XT1_ET2_EERKNS6_3ExpIT3_SB_XT4_EEE+0x1ab) [0x7f717dc00e92]
[bt] (2) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4expr9ExpEngineINS_2sv6savetoENS_6TensorINS_3cpuELi1EiEEiE4EvalINS0_8RangeExpIiEEEEvPS6_RKNS0_3ExpI$
_iLi1EEE+0x23) [0x7f717dbfee0d]
[bt] (3) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4expr9RValueExpINS_6TensorINS_3cpuELi1EiEEiE8__assignINS0_8RangeExpIiEELi1EEERS4_RKNS0_3ExpIT_iXT0_$
EE+0x37) [0x7f717dbfe4e9]
[bt] (4) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6TensorINS_3cpuELi1EiEaSINS_4expr8RangeExpIiEELi1EEERS2_RKNS4_3ExpIT_iXT0_EEE+0x23) [0x7f717dbfa5c3$
[bt] (5) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op8TopKImplIN7mshadow3cpuEEEvNS_10RunContextENS_8ResourceERKNS_5TBlobERKSt6vectorIS6_SaIS6_EERKNS0_9$
opKParamE+0xcb5) [0x7f717dc174dc]
[bt] (6) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op4TopKIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISC_EERKSB_INS_9$
pReqTypeESaISH_EESG_+0x18b) [0x7f717dc13412]
[bt] (7) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_5TBlobESaIS9_EERKS8_INS4_9Op$
eqTypeESaISE_EESD_EPSJ_E9_M_invokeERKSt9_Any_dataS3_S7_SD_SI_SD_+0x91) [0x7f717cd26994]
[bt] (8) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZNKSt8functionIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_5TBlobESaIS9_EERKS8_INS4_9OpReqTypeES$
ISE_EESD_EEclES3_S7_SD_SI_SD_+0xa6) [0x7f717e31ba06]
[bt] (9) /home/ubuntu/haibin-mxnet/python/mxnet/../../lib/libmxnet.so(_ZZN5mxnet10imperative12PushFComputeERKSt8functionIFvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobE$
aISA_EERKS9_INS_9OpReqTypeESaISF_EESE_EEPKNS2_2OpES5_RKNS_7ContextERKS9_IPNS_6engine3VarESaISW_EES10_RKS9_INS_8ResourceESaIS11_EERKS9_IPNS_7NDArrayESaIS17_EES1B_RKS9_IjSaIjEESJ_ENK$
lNS_10RunContextENSU_18CallbackOnCompleteEE_clES1G_S1H_+0x1f2) [0x7f717e315a74]

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

>>> a = mx.nd.arange(0, 54686454, step=1, repeat=1)
>>> a.shape
(54686454L,)
>>> a.topk(k=54686454)
(error)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

1. 2.

eric-haibin-lin commented 6 years ago

arange op is Fixed by #8268, but other operators that uses mshadow::range have this issue