apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.73k stars 6.81k forks source link

Using MKL causes C++ layer blow up while running lstm_bucketing example #5314

Closed sergeykolychev closed 6 years ago

sergeykolychev commented 7 years ago

For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.

WARNING: discarded 89 sentences longer than the largest bucket. WARNING: discarded 4 sentences longer than the largest bucket. [01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/operator/./mkl/mkl_concat-inl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)

Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt 6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESDSD+0xc10) [0x7fa00ecfd950] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS10RunContextEPNS0 8OprBlockE+0x8c) [0x7fa00ed5531c] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1 3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]

[01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/engine/./threaded_engine.h:336: [01:06:12] src/operator/./mkl/mkl_concat-i nl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)

Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt 6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESDSD+0xc10) [0x7fa00ecfd950] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS10RunContextEPNS0 8OprBlockE+0x8c) [0x7fa00ed5531c] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1 3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]

Environment info

Operating System: LInux Ubuntu 16.04 Compiler: gcc 4.8 Package used (Python/R/Scala/Julia): Python MXNet version: 0.9.4 Or if installed from source:

MXNet commit hash (git rev-parse HEAD): 55bb4cd2e06c24b46664ac708150e2283e9695c3 If you are using python package, please provide

Python version and distribution: python2.7 If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

~/mxnet/example/rnn$ python lstm_bucketing.py

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. Compile with USE_BLAS=mkl
  2. run mxnet/example/rnn$ python lstm_bucketing.py

What have you tried to solve it?

1. 2. 3.

piiswrong commented 7 years ago

@glingyan @zhenlinluo

glingyan commented 7 years ago

find the problem concat input is 20 , current API limitation is only 8 will report to mkl team the work around is $ git diff src/ diff --git a/src/operator/concat.cc b/src/operator/concat.cc index fc54123..d13106e 100644 --- a/src/operator/concat.cc +++ b/src/operator/concat.cc @@ -18,7 +18,8 @@ template<> Operator CreateOp(ConcatParam param, int dtype) { Operator op = NULL;

if MXNET_USE_MKL2017 == 1

sergeykolychev commented 7 years ago

@glingyan Thank you! I am training char-level rnn on Intel Xeon E5-2666 v3 (Haswell) and mxnet compiled without MKL processes 3.46 samples/sec and with MKL the speed is about 220 samples/sec! I feel like this work-around needs to be in master until the MKL supports more concat inputs. @piiswrong In my tests MKL gives better or equal results on Linux when compared to 'apple' blas on OSX (which is also extremely fast). OpenBLAS or BLAS are very slow in the orders of at least one magnitude. May be it'll make sense to recommend users to compile MXNet with MKL instead of OpenBLAS on this page http://mxnet.io/get_started/ubuntu_setup.html ?

glingyan commented 7 years ago

@sergeykolychev sure , there will a big upstream patch these day

piiswrong commented 7 years ago

Does mkl work for mac? If it does then we should change the tutorial

@glingyan Do we need the user to have a full mkl installation for using BLAS=mkl?

sergeykolychev commented 7 years ago

@piiswrong I tried to use MKL on mac and it did not compile, it has different api compared to mklml that we use on Linux. I also was unable to find mklml for mac, though I saw reports on google that some individuals were able to compile mklml from the sources on OSX but did not pursue that route yet. However it seems that stock blas from Apple is on par with MKL and can continue to be used on OSX. The tutorial page I referred to is specifically for Ubuntu, so it can be changed independently of mac related tutorials.

piiswrong commented 7 years ago

I think blas defaults to apple on for osx.mk. Or at least it used to

sergeykolychev commented 7 years ago

@piiswrong yes, it does default to 'apple' on osx, which is correct behavior, however we are probably doing disservice to linux not defaulting to mklml. Even if wide-spread usage of mkl will lead to some issues , it's a good thing, cause they'll get quickly fixed seeing how responsive @glingyan is.

sergeykolychev commented 7 years ago

@piiswrong , @glingyan I want to apologize and correct myself, the 3.46 I was getting with openblas were related to problems on my end, not to openblas, while MKL is still faster than openblas but the difference is not drastic. it's more like 120 vs 200 What's more I see my char rnn network converging really fast and reliably on openblas and reaching high plateau and not converging with MKL, as well performance with MKL reliably drops like 75% in the middle of second epoch, I can replicate it reliably. Its seems like there's some bug in MKL implementation. I'll try to write python example over weekend to prove that. (my current code is in perl so not really to be trusted at this point)

glingyan commented 7 years ago

@sergeykolychev there will be a fix for converage on some model tonight or tomorrow , please waiting my patch , upstream test is on going if the patch failed, I will help you to debug

sergeykolychev commented 7 years ago

@glingyan thank you, will wait.

glingyan commented 7 years ago

@sergeykolychev please check preview at https://github.com/glingyan/mxnet

glingyan commented 7 years ago

@zhenlinluo for mkl on MAC issue

sergeykolychev commented 7 years ago

@glingyan , the issues are not fixed, here is what I see. my code is really basic char lstm rnn network and the data is tiny shakespreare. It's written in perl but frankly I do not think it matters. This is output of my code with USE_BLAS=mkl compiled from your master

$ ./char_lstm.pl 
Epoch[0] Batch [50] Speed: 218.15 samples/sec   Train-Perplexity=22.038119 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100]    Speed: 217.41 samples/sec   Train-Perplexity=14.247312 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150]    Speed: 216.87 samples/sec   Train-Perplexity=13.642289 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200]    Speed: 217.11 samples/sec   Train-Perplexity=13.410031 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250]    Speed: 217.15 samples/sec   Train-Perplexity=12.963284 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300]    Speed: 217.47 samples/sec   Train-Perplexity=12.734377 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350]    Speed: 217.86 samples/sec   Train-Perplexity=12.310390 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400]    Speed: 219.83 samples/sec   Train-Perplexity=12.098077 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450]    Speed: 219.63 samples/sec   Train-Perplexity=12.117380 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500]    Speed: 211.98 samples/sec   Train-Perplexity=11.890713 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550]    Speed: 198.78 samples/sec   Train-Perplexity=11.584888 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600]    Speed: 189.23 samples/sec   Train-Perplexity=11.388555 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650]    Speed: 187.45 samples/sec   Train-Perplexity=11.326587 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700]    Speed: 189.07 samples/sec   Train-Perplexity=11.295736 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750]    Speed: 200.36 samples/sec   Train-Perplexity=11.263378 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800]    Speed: 215.10 samples/sec   Train-Perplexity=11.140880 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850]    Speed: 220.77 samples/sec   Train-Perplexity=11.090139 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900]    Speed: 220.96 samples/sec   Train-Perplexity=11.052934 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950]    Speed: 220.78 samples/sec   Train-Perplexity=10.915363 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000]   Speed: 221.17 samples/sec   Train-Perplexity=10.952525 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050]   Speed: 221.20 samples/sec   Train-Perplexity=10.986085 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=10.845770 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=164.381 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50] Speed: 221.72 samples/sec   Train-Perplexity=11.260175 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100]    Speed: 221.25 samples/sec   Train-Perplexity=10.971702 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150]    Speed: 215.08 samples/sec   Train-Perplexity=10.753926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200]    Speed: 194.21 samples/sec   Train-Perplexity=10.765214 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250]    Speed: 164.44 samples/sec   Train-Perplexity=10.594932 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300]    Speed: 125.37 samples/sec   Train-Perplexity=10.773817 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350]    Speed: 75.13 samples/sec    Train-Perplexity=10.838403 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400]    Speed: 33.06 samples/sec    Train-Perplexity=10.564098 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450]    Speed: 14.29 samples/sec    Train-Perplexity=10.620187 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500]    Speed: 8.02 samples/sec Train-Perplexity=10.499926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550]    Speed: 6.13 samples/sec Train-Perplexity=10.723671 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.

And this is the same code with USE_BLAS=openblas

./char_lstm.pl 
Epoch[0] Batch [50] Speed: 116.49 samples/sec   Train-Perplexity=25.415275 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100]    Speed: 115.85 samples/sec   Train-Perplexity=11.252742 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150]    Speed: 115.76 samples/sec   Train-Perplexity=9.379204 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200]    Speed: 115.67 samples/sec   Train-Perplexity=8.617612 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250]    Speed: 115.80 samples/sec   Train-Perplexity=7.716166 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300]    Speed: 115.91 samples/sec   Train-Perplexity=7.254024 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350]    Speed: 115.95 samples/sec   Train-Perplexity=6.983290 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400]    Speed: 115.73 samples/sec   Train-Perplexity=6.731437 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450]    Speed: 115.62 samples/sec   Train-Perplexity=6.464352 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500]    Speed: 115.75 samples/sec   Train-Perplexity=6.426460 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550]    Speed: 116.06 samples/sec   Train-Perplexity=6.127249 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600]    Speed: 116.28 samples/sec   Train-Perplexity=6.209482 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650]    Speed: 116.49 samples/sec   Train-Perplexity=5.889429 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700]    Speed: 116.33 samples/sec   Train-Perplexity=6.064267 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750]    Speed: 116.34 samples/sec   Train-Perplexity=5.679001 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800]    Speed: 116.55 samples/sec   Train-Perplexity=5.825945 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850]    Speed: 116.67 samples/sec   Train-Perplexity=5.764707 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900]    Speed: 116.09 samples/sec   Train-Perplexity=5.535110 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950]    Speed: 115.97 samples/sec   Train-Perplexity=5.466780 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000]   Speed: 115.94 samples/sec   Train-Perplexity=5.636686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050]   Speed: 116.10 samples/sec   Train-Perplexity=5.457590 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=5.296764 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=300.178 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50] Speed: 115.82 samples/sec   Train-Perplexity=5.349508 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100]    Speed: 115.90 samples/sec   Train-Perplexity=5.343474 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150]    Speed: 116.04 samples/sec   Train-Perplexity=5.330013 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200]    Speed: 115.77 samples/sec   Train-Perplexity=5.214058 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250]    Speed: 115.81 samples/sec   Train-Perplexity=5.051733 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300]    Speed: 116.05 samples/sec   Train-Perplexity=5.270308 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350]    Speed: 115.78 samples/sec   Train-Perplexity=5.246875 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400]    Speed: 115.42 samples/sec   Train-Perplexity=5.383535 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450]    Speed: 115.43 samples/sec   Train-Perplexity=5.192108 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500]    Speed: 115.56 samples/sec   Train-Perplexity=5.231138 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550]    Speed: 115.44 samples/sec   Train-Perplexity=5.204995 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [600]    Speed: 115.41 samples/sec   Train-Perplexity=5.093277 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [650]    Speed: 115.50 samples/sec   Train-Perplexity=5.244379 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [700]    Speed: 115.42 samples/sec   Train-Perplexity=5.104197 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [750]    Speed: 115.28 samples/sec   Train-Perplexity=5.007315 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [800]    Speed: 114.98 samples/sec   Train-Perplexity=4.979710 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [850]    Speed: 115.50 samples/sec   Train-Perplexity=4.895686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [900]    Speed: 115.41 samples/sec   Train-Perplexity=5.103383 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [950]    Speed: 115.41 samples/sec   Train-Perplexity=5.112778 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1000]   Speed: 115.39 samples/sec   Train-Perplexity=4.990302 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1050]   Speed: 115.69 samples/sec   Train-Perplexity=4.994740 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.

As you can see when MKL is used it starts twice as fast compared to OpenBLAS, but then in the middle of the second epoch slows to a crawl, as well the perplexity metric gets stuck at ~ 11 (btw I had a bug in the perl layer which was essentially of not linking states between lstm sequence layers and my network was getting stuck at the same 11 , I wonder if you have similar bug in mkl implementation). Now looking at the OpenBLAS (running the same exact code, one difference is that mshadow uses openblas instead of mkl) , the performance starts twice as slow but never degrades and the network converges reliably and quickly with perplexity metric going down to ~ 5 as it should (julia examples have the same number) And when I run the same code on OSX using USE_BLAS=apple I get the same exact results as with OpenBLAS on Linux.

glingyan commented 7 years ago

@sergeykolychev why closed ,not problem for you now?

sergeykolychev commented 7 years ago

@glingyan sorry, thought you do not need it anymore, of course the issue still exist if you did not add new code since that preview. Though I moved my calculations to GPU box, so not really concerned with MKL right now.

glingyan commented 7 years ago

@sergeykolychev will help to debug, but where to setup the env , or use example/rnn is enough ?

sergeykolychev commented 7 years ago

@glingyan Thanks! Here is a minimal example that you can use to debug the problem Make this change to the code enable adam optimizer for this example

diff --git a/example/rnn/lstm_bucketing.py b/example/rnn/lstm_bucketing.py
index 4bc934a..7ab3c95 100644
--- a/example/rnn/lstm_bucketing.py
+++ b/example/rnn/lstm_bucketing.py
@@ -100,7 +100,6 @@ if __name__ == '__main__':
         kvstore             = args.kv_store,
         optimizer           = args.optimizer,
         optimizer_params    = { 'learning_rate': args.lr,
-                                'momentum': args.mom,
                                 'wd': args.wd },
         initializer         = mx.init.Xavier(factor_type="in", magnitude=2.34),
         num_epoch           = args.num_epochs,

From the output below you can see that in the middle of second epoch the performance starts to degrade and the network is not converging with openblas the perplexity would be under 200 by the middle of the second epoch.

:~/mxnet/example/rnn$ python lstm_bucketing.py --optimizer adam
WARNING: discarded 89 sentences longer than the largest bucket.
WARNING: discarded 4 sentences longer than the largest bucket.
2017-03-13 07:42:59,877 Epoch[0] Batch [50] Speed: 134.00 samples/sec   Train-Perplexity=2966.442802
2017-03-13 07:43:12,753 Epoch[0] Batch [100]    Speed: 124.27 samples/sec   Train-Perplexity=1133.826717
2017-03-13 07:43:27,474 Epoch[0] Batch [150]    Speed: 108.69 samples/sec   Train-Perplexity=1011.551690
2017-03-13 07:43:40,795 Epoch[0] Batch [200]    Speed: 120.12 samples/sec   Train-Perplexity=986.641991
2017-03-13 07:43:54,045 Epoch[0] Batch [250]    Speed: 120.76 samples/sec   Train-Perplexity=1043.929538
2017-03-13 07:44:07,208 Epoch[0] Batch [300]    Speed: 121.55 samples/sec   Train-Perplexity=1026.777324
2017-03-13 07:44:20,063 Epoch[0] Batch [350]    Speed: 124.47 samples/sec   Train-Perplexity=1003.001102
2017-03-13 07:44:32,424 Epoch[0] Batch [400]    Speed: 129.45 samples/sec   Train-Perplexity=1037.213538
2017-03-13 07:44:44,309 Epoch[0] Batch [450]    Speed: 134.62 samples/sec   Train-Perplexity=919.465923
2017-03-13 07:44:57,358 Epoch[0] Batch [500]    Speed: 122.63 samples/sec   Train-Perplexity=803.629447
2017-03-13 07:45:11,036 Epoch[0] Batch [550]    Speed: 116.98 samples/sec   Train-Perplexity=731.199299
2017-03-13 07:45:24,050 Epoch[0] Batch [600]    Speed: 122.94 samples/sec   Train-Perplexity=767.911830
2017-03-13 07:45:37,369 Epoch[0] Batch [650]    Speed: 120.13 samples/sec   Train-Perplexity=777.475126
2017-03-13 07:45:50,049 Epoch[0] Batch [700]    Speed: 126.19 samples/sec   Train-Perplexity=735.073373
2017-03-13 07:46:02,074 Epoch[0] Batch [750]    Speed: 133.07 samples/sec   Train-Perplexity=683.973815
2017-03-13 07:46:14,973 Epoch[0] Batch [800]    Speed: 124.04 samples/sec   Train-Perplexity=648.091700
2017-03-13 07:46:28,155 Epoch[0] Batch [850]    Speed: 121.39 samples/sec   Train-Perplexity=610.641153
2017-03-13 07:46:41,153 Epoch[0] Batch [900]    Speed: 123.10 samples/sec   Train-Perplexity=615.271286
2017-03-13 07:46:54,415 Epoch[0] Batch [950]    Speed: 120.64 samples/sec   Train-Perplexity=580.477461
2017-03-13 07:47:07,536 Epoch[0] Batch [1000]   Speed: 121.95 samples/sec   Train-Perplexity=595.476506
2017-03-13 07:47:19,853 Epoch[0] Batch [1050]   Speed: 129.90 samples/sec   Train-Perplexity=591.306123
2017-03-13 07:47:32,885 Epoch[0] Batch [1100]   Speed: 122.78 samples/sec   Train-Perplexity=604.687834
2017-03-13 07:47:46,234 Epoch[0] Batch [1150]   Speed: 119.86 samples/sec   Train-Perplexity=606.537159
2017-03-13 07:48:00,723 Epoch[0] Batch [1200]   Speed: 110.43 samples/sec   Train-Perplexity=596.381659
2017-03-13 07:48:18,747 Epoch[0] Batch [1250]   Speed: 88.77 samples/sec    Train-Perplexity=579.731480
2017-03-13 07:48:37,840 Epoch[0] Batch [1300]   Speed: 83.80 samples/sec    Train-Perplexity=568.241532
2017-03-13 07:48:40,894 Epoch[0] Train-Perplexity=561.096900
2017-03-13 07:48:40,895 Epoch[0] Time cost=353.188
2017-03-13 07:48:54,760 Epoch[0] Validation-Perplexity=516.515529
2017-03-13 07:49:16,767 Epoch[1] Batch [50] Speed: 73.92 samples/sec    Train-Perplexity=496.200794
2017-03-13 07:49:38,937 Epoch[1] Batch [100]    Speed: 72.17 samples/sec    Train-Perplexity=486.276388
2017-03-13 07:50:06,259 Epoch[1] Batch [150]    Speed: 58.56 samples/sec    Train-Perplexity=474.950573
2017-03-13 07:50:48,366 Epoch[1] Batch [200]    Speed: 38.00 samples/sec    Train-Perplexity=478.531360
2017-03-13 07:51:57,539 Epoch[1] Batch [250]    Speed: 23.13 samples/sec    Train-Perplexity=484.835826
2017-03-13 07:53:58,770 Epoch[1] Batch [300]    Speed: 13.20 samples/sec    Train-Perplexity=522.462073
yajiedesign commented 6 years ago

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!