Closed sergeykolychev closed 6 years ago
@glingyan @zhenlinluo
find the problem
concat input is 20 , current API limitation is only 8
will report to mkl team
the work around is
$ git diff src/
diff --git a/src/operator/concat.cc b/src/operator/concat.cc
index fc54123..d13106e 100644
--- a/src/operator/concat.cc
+++ b/src/operator/concat.cc
@@ -18,7 +18,8 @@ template<>
Operator CreateOp
@glingyan Thank you! I am training char-level rnn on Intel Xeon E5-2666 v3 (Haswell) and mxnet compiled without MKL processes 3.46 samples/sec and with MKL the speed is about 220 samples/sec! I feel like this work-around needs to be in master until the MKL supports more concat inputs. @piiswrong In my tests MKL gives better or equal results on Linux when compared to 'apple' blas on OSX (which is also extremely fast). OpenBLAS or BLAS are very slow in the orders of at least one magnitude. May be it'll make sense to recommend users to compile MXNet with MKL instead of OpenBLAS on this page http://mxnet.io/get_started/ubuntu_setup.html ?
@sergeykolychev sure , there will a big upstream patch these day
Does mkl work for mac? If it does then we should change the tutorial
@glingyan Do we need the user to have a full mkl installation for using BLAS=mkl?
@piiswrong I tried to use MKL on mac and it did not compile, it has different api compared to mklml that we use on Linux. I also was unable to find mklml for mac, though I saw reports on google that some individuals were able to compile mklml from the sources on OSX but did not pursue that route yet. However it seems that stock blas from Apple is on par with MKL and can continue to be used on OSX. The tutorial page I referred to is specifically for Ubuntu, so it can be changed independently of mac related tutorials.
I think blas defaults to apple on for osx.mk. Or at least it used to
@piiswrong yes, it does default to 'apple' on osx, which is correct behavior, however we are probably doing disservice to linux not defaulting to mklml. Even if wide-spread usage of mkl will lead to some issues , it's a good thing, cause they'll get quickly fixed seeing how responsive @glingyan is.
@piiswrong , @glingyan I want to apologize and correct myself, the 3.46 I was getting with openblas were related to problems on my end, not to openblas, while MKL is still faster than openblas but the difference is not drastic. it's more like 120 vs 200 What's more I see my char rnn network converging really fast and reliably on openblas and reaching high plateau and not converging with MKL, as well performance with MKL reliably drops like 75% in the middle of second epoch, I can replicate it reliably. Its seems like there's some bug in MKL implementation. I'll try to write python example over weekend to prove that. (my current code is in perl so not really to be trusted at this point)
@sergeykolychev there will be a fix for converage on some model tonight or tomorrow , please waiting my patch , upstream test is on going if the patch failed, I will help you to debug
@glingyan thank you, will wait.
@sergeykolychev please check preview at https://github.com/glingyan/mxnet
@zhenlinluo for mkl on MAC issue
@glingyan , the issues are not fixed, here is what I see. my code is really basic char lstm rnn network and the data is tiny shakespreare. It's written in perl but frankly I do not think it matters. This is output of my code with USE_BLAS=mkl compiled from your master
$ ./char_lstm.pl
Epoch[0] Batch [50] Speed: 218.15 samples/sec Train-Perplexity=22.038119 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100] Speed: 217.41 samples/sec Train-Perplexity=14.247312 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150] Speed: 216.87 samples/sec Train-Perplexity=13.642289 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200] Speed: 217.11 samples/sec Train-Perplexity=13.410031 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250] Speed: 217.15 samples/sec Train-Perplexity=12.963284 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300] Speed: 217.47 samples/sec Train-Perplexity=12.734377 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350] Speed: 217.86 samples/sec Train-Perplexity=12.310390 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400] Speed: 219.83 samples/sec Train-Perplexity=12.098077 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450] Speed: 219.63 samples/sec Train-Perplexity=12.117380 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500] Speed: 211.98 samples/sec Train-Perplexity=11.890713 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550] Speed: 198.78 samples/sec Train-Perplexity=11.584888 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600] Speed: 189.23 samples/sec Train-Perplexity=11.388555 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650] Speed: 187.45 samples/sec Train-Perplexity=11.326587 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700] Speed: 189.07 samples/sec Train-Perplexity=11.295736 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750] Speed: 200.36 samples/sec Train-Perplexity=11.263378 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800] Speed: 215.10 samples/sec Train-Perplexity=11.140880 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850] Speed: 220.77 samples/sec Train-Perplexity=11.090139 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900] Speed: 220.96 samples/sec Train-Perplexity=11.052934 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950] Speed: 220.78 samples/sec Train-Perplexity=10.915363 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000] Speed: 221.17 samples/sec Train-Perplexity=10.952525 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050] Speed: 221.20 samples/sec Train-Perplexity=10.986085 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=10.845770 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=164.381 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50] Speed: 221.72 samples/sec Train-Perplexity=11.260175 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100] Speed: 221.25 samples/sec Train-Perplexity=10.971702 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150] Speed: 215.08 samples/sec Train-Perplexity=10.753926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200] Speed: 194.21 samples/sec Train-Perplexity=10.765214 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250] Speed: 164.44 samples/sec Train-Perplexity=10.594932 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300] Speed: 125.37 samples/sec Train-Perplexity=10.773817 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350] Speed: 75.13 samples/sec Train-Perplexity=10.838403 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400] Speed: 33.06 samples/sec Train-Perplexity=10.564098 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450] Speed: 14.29 samples/sec Train-Perplexity=10.620187 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500] Speed: 8.02 samples/sec Train-Perplexity=10.499926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550] Speed: 6.13 samples/sec Train-Perplexity=10.723671 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
And this is the same code with USE_BLAS=openblas
./char_lstm.pl
Epoch[0] Batch [50] Speed: 116.49 samples/sec Train-Perplexity=25.415275 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100] Speed: 115.85 samples/sec Train-Perplexity=11.252742 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150] Speed: 115.76 samples/sec Train-Perplexity=9.379204 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200] Speed: 115.67 samples/sec Train-Perplexity=8.617612 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250] Speed: 115.80 samples/sec Train-Perplexity=7.716166 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300] Speed: 115.91 samples/sec Train-Perplexity=7.254024 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350] Speed: 115.95 samples/sec Train-Perplexity=6.983290 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400] Speed: 115.73 samples/sec Train-Perplexity=6.731437 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450] Speed: 115.62 samples/sec Train-Perplexity=6.464352 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500] Speed: 115.75 samples/sec Train-Perplexity=6.426460 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550] Speed: 116.06 samples/sec Train-Perplexity=6.127249 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600] Speed: 116.28 samples/sec Train-Perplexity=6.209482 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650] Speed: 116.49 samples/sec Train-Perplexity=5.889429 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700] Speed: 116.33 samples/sec Train-Perplexity=6.064267 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750] Speed: 116.34 samples/sec Train-Perplexity=5.679001 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800] Speed: 116.55 samples/sec Train-Perplexity=5.825945 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850] Speed: 116.67 samples/sec Train-Perplexity=5.764707 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900] Speed: 116.09 samples/sec Train-Perplexity=5.535110 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950] Speed: 115.97 samples/sec Train-Perplexity=5.466780 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000] Speed: 115.94 samples/sec Train-Perplexity=5.636686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050] Speed: 116.10 samples/sec Train-Perplexity=5.457590 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=5.296764 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=300.178 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50] Speed: 115.82 samples/sec Train-Perplexity=5.349508 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100] Speed: 115.90 samples/sec Train-Perplexity=5.343474 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150] Speed: 116.04 samples/sec Train-Perplexity=5.330013 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200] Speed: 115.77 samples/sec Train-Perplexity=5.214058 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250] Speed: 115.81 samples/sec Train-Perplexity=5.051733 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300] Speed: 116.05 samples/sec Train-Perplexity=5.270308 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350] Speed: 115.78 samples/sec Train-Perplexity=5.246875 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400] Speed: 115.42 samples/sec Train-Perplexity=5.383535 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450] Speed: 115.43 samples/sec Train-Perplexity=5.192108 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500] Speed: 115.56 samples/sec Train-Perplexity=5.231138 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550] Speed: 115.44 samples/sec Train-Perplexity=5.204995 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [600] Speed: 115.41 samples/sec Train-Perplexity=5.093277 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [650] Speed: 115.50 samples/sec Train-Perplexity=5.244379 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [700] Speed: 115.42 samples/sec Train-Perplexity=5.104197 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [750] Speed: 115.28 samples/sec Train-Perplexity=5.007315 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [800] Speed: 114.98 samples/sec Train-Perplexity=4.979710 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [850] Speed: 115.50 samples/sec Train-Perplexity=4.895686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [900] Speed: 115.41 samples/sec Train-Perplexity=5.103383 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [950] Speed: 115.41 samples/sec Train-Perplexity=5.112778 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1000] Speed: 115.39 samples/sec Train-Perplexity=4.990302 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1050] Speed: 115.69 samples/sec Train-Perplexity=4.994740 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
As you can see when MKL is used it starts twice as fast compared to OpenBLAS, but then in the middle of the second epoch slows to a crawl, as well the perplexity metric gets stuck at ~ 11 (btw I had a bug in the perl layer which was essentially of not linking states between lstm sequence layers and my network was getting stuck at the same 11 , I wonder if you have similar bug in mkl implementation). Now looking at the OpenBLAS (running the same exact code, one difference is that mshadow uses openblas instead of mkl) , the performance starts twice as slow but never degrades and the network converges reliably and quickly with perplexity metric going down to ~ 5 as it should (julia examples have the same number) And when I run the same code on OSX using USE_BLAS=apple I get the same exact results as with OpenBLAS on Linux.
@sergeykolychev why closed ,not problem for you now?
@glingyan sorry, thought you do not need it anymore, of course the issue still exist if you did not add new code since that preview. Though I moved my calculations to GPU box, so not really concerned with MKL right now.
@sergeykolychev will help to debug, but where to setup the env , or use example/rnn is enough ?
@glingyan Thanks! Here is a minimal example that you can use to debug the problem Make this change to the code enable adam optimizer for this example
diff --git a/example/rnn/lstm_bucketing.py b/example/rnn/lstm_bucketing.py
index 4bc934a..7ab3c95 100644
--- a/example/rnn/lstm_bucketing.py
+++ b/example/rnn/lstm_bucketing.py
@@ -100,7 +100,6 @@ if __name__ == '__main__':
kvstore = args.kv_store,
optimizer = args.optimizer,
optimizer_params = { 'learning_rate': args.lr,
- 'momentum': args.mom,
'wd': args.wd },
initializer = mx.init.Xavier(factor_type="in", magnitude=2.34),
num_epoch = args.num_epochs,
From the output below you can see that in the middle of second epoch the performance starts to degrade and the network is not converging with openblas the perplexity would be under 200 by the middle of the second epoch.
:~/mxnet/example/rnn$ python lstm_bucketing.py --optimizer adam
WARNING: discarded 89 sentences longer than the largest bucket.
WARNING: discarded 4 sentences longer than the largest bucket.
2017-03-13 07:42:59,877 Epoch[0] Batch [50] Speed: 134.00 samples/sec Train-Perplexity=2966.442802
2017-03-13 07:43:12,753 Epoch[0] Batch [100] Speed: 124.27 samples/sec Train-Perplexity=1133.826717
2017-03-13 07:43:27,474 Epoch[0] Batch [150] Speed: 108.69 samples/sec Train-Perplexity=1011.551690
2017-03-13 07:43:40,795 Epoch[0] Batch [200] Speed: 120.12 samples/sec Train-Perplexity=986.641991
2017-03-13 07:43:54,045 Epoch[0] Batch [250] Speed: 120.76 samples/sec Train-Perplexity=1043.929538
2017-03-13 07:44:07,208 Epoch[0] Batch [300] Speed: 121.55 samples/sec Train-Perplexity=1026.777324
2017-03-13 07:44:20,063 Epoch[0] Batch [350] Speed: 124.47 samples/sec Train-Perplexity=1003.001102
2017-03-13 07:44:32,424 Epoch[0] Batch [400] Speed: 129.45 samples/sec Train-Perplexity=1037.213538
2017-03-13 07:44:44,309 Epoch[0] Batch [450] Speed: 134.62 samples/sec Train-Perplexity=919.465923
2017-03-13 07:44:57,358 Epoch[0] Batch [500] Speed: 122.63 samples/sec Train-Perplexity=803.629447
2017-03-13 07:45:11,036 Epoch[0] Batch [550] Speed: 116.98 samples/sec Train-Perplexity=731.199299
2017-03-13 07:45:24,050 Epoch[0] Batch [600] Speed: 122.94 samples/sec Train-Perplexity=767.911830
2017-03-13 07:45:37,369 Epoch[0] Batch [650] Speed: 120.13 samples/sec Train-Perplexity=777.475126
2017-03-13 07:45:50,049 Epoch[0] Batch [700] Speed: 126.19 samples/sec Train-Perplexity=735.073373
2017-03-13 07:46:02,074 Epoch[0] Batch [750] Speed: 133.07 samples/sec Train-Perplexity=683.973815
2017-03-13 07:46:14,973 Epoch[0] Batch [800] Speed: 124.04 samples/sec Train-Perplexity=648.091700
2017-03-13 07:46:28,155 Epoch[0] Batch [850] Speed: 121.39 samples/sec Train-Perplexity=610.641153
2017-03-13 07:46:41,153 Epoch[0] Batch [900] Speed: 123.10 samples/sec Train-Perplexity=615.271286
2017-03-13 07:46:54,415 Epoch[0] Batch [950] Speed: 120.64 samples/sec Train-Perplexity=580.477461
2017-03-13 07:47:07,536 Epoch[0] Batch [1000] Speed: 121.95 samples/sec Train-Perplexity=595.476506
2017-03-13 07:47:19,853 Epoch[0] Batch [1050] Speed: 129.90 samples/sec Train-Perplexity=591.306123
2017-03-13 07:47:32,885 Epoch[0] Batch [1100] Speed: 122.78 samples/sec Train-Perplexity=604.687834
2017-03-13 07:47:46,234 Epoch[0] Batch [1150] Speed: 119.86 samples/sec Train-Perplexity=606.537159
2017-03-13 07:48:00,723 Epoch[0] Batch [1200] Speed: 110.43 samples/sec Train-Perplexity=596.381659
2017-03-13 07:48:18,747 Epoch[0] Batch [1250] Speed: 88.77 samples/sec Train-Perplexity=579.731480
2017-03-13 07:48:37,840 Epoch[0] Batch [1300] Speed: 83.80 samples/sec Train-Perplexity=568.241532
2017-03-13 07:48:40,894 Epoch[0] Train-Perplexity=561.096900
2017-03-13 07:48:40,895 Epoch[0] Time cost=353.188
2017-03-13 07:48:54,760 Epoch[0] Validation-Perplexity=516.515529
2017-03-13 07:49:16,767 Epoch[1] Batch [50] Speed: 73.92 samples/sec Train-Perplexity=496.200794
2017-03-13 07:49:38,937 Epoch[1] Batch [100] Speed: 72.17 samples/sec Train-Perplexity=486.276388
2017-03-13 07:50:06,259 Epoch[1] Batch [150] Speed: 58.56 samples/sec Train-Perplexity=474.950573
2017-03-13 07:50:48,366 Epoch[1] Batch [200] Speed: 38.00 samples/sec Train-Perplexity=478.531360
2017-03-13 07:51:57,539 Epoch[1] Batch [250] Speed: 23.13 samples/sec Train-Perplexity=484.835826
2017-03-13 07:53:58,770 Epoch[1] Batch [300] Speed: 13.20 samples/sec Train-Perplexity=522.462073
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!
For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.
WARNING: discarded 89 sentences longer than the largest bucket. WARNING: discarded 4 sentences longer than the largest bucket. [01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/operator/./mkl/mkl_concat-inl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)
Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt 6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESDSD+0xc10) [0x7fa00ecfd950] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS10RunContextEPNS0 8OprBlockE+0x8c) [0x7fa00ed5531c] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1 3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]
[01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/engine/./threaded_engine.h:336: [01:06:12] src/operator/./mkl/mkl_concat-i nl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)
Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt 6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESDSD+0xc10) [0x7fa00ecfd950] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS10RunContextEPNS0 8OprBlockE+0x8c) [0x7fa00ed5531c] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1 3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]
Environment info
Operating System: LInux Ubuntu 16.04 Compiler: gcc 4.8 Package used (Python/R/Scala/Julia): Python MXNet version: 0.9.4 Or if installed from source:
MXNet commit hash (
git rev-parse HEAD
): 55bb4cd2e06c24b46664ac708150e2283e9695c3 If you are using python package, please providePython version and distribution: python2.7 If you are using R package, please provide
R
sessionInfo()
:Error Message:
Please paste the full error message, including stack trace.
Minimum reproducible example
if you are using your own code, please provide a short script that reproduces the error.
~/mxnet/example/rnn$ python lstm_bucketing.py
Steps to reproduce
or if you are running standard examples, please provide the commands you have run that lead to the error.
What have you tried to solve it?
1. 2. 3.