Question about LSTM implementation: Perplexity convergence differs between lstm_bucketing.py and rnn_cell_demo.py

YujiOshima commented 7 years ago

For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Ubuntu 14.04.03 (running on docker. docker host is Ubuntu 16.04)

Compiler: gcc 4.8.4

Package used (Python/R/Scala/Julia): Python

MXNet commit hash (git rev-parse HEAD): b6e8eec8b94c70d9e116b3a4443ce75ce3e07aa2

If you are using python package, please provide

Python version and distribution: Python 2.7.6

Question

I think that the following three objects implement the same purpose differently.

lstm_bucketing https://github.com/dmlc/mxnet/blob/master/example/rnn/lstm_bucketing.py
rnn_cell_demo_batch_major https://github.com/dmlc/mxnet/blob/master/example/rnn/rnn_cell_demo.py
rnn_cell_demo_time_major https://github.com/dmlc/mxnet/blob/master/example/rnn-time-major/rnn_cell_demo.py

But the results of Perplexity convergence are different.

lstm_bucketing

bucket of len  32 : 52318 samples
Summary of dataset ==================
bucket of len  32 : 4131 samples
[Deprecation Warning] mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
[07:36:30] src/operator/tensor/./matrix_op-inl.h:155: Using target_shape will be deprecated.
2017-01-23 07:36:30,132 Start training with [gpu(0)]
[07:36:30] src/operator/tensor/./matrix_op-inl.h:155: Using target_shape will be deprecated.
[07:36:30] src/operator/tensor/./matrix_op-inl.h:155: Using target_shape will be deprecated.
2017-01-23 07:36:36,009 Epoch[0] Batch [50]    Speed: 326.78 samplesˇ/sec    Train-Perplexity=819.327175
2017-01-23 07:36:41,052 Epoch[0] Batch [100]    Speed: 317.28 samples/sec    Train-Perplexity=37.890269
2017-01-23 07:36:46,056 Epoch[0] Batch [150]    Speed: 319.73 samples/sec    Train-Perplexity=30.136881
2017-01-23 07:36:51,061 Epoch[0] Batch [200]    Speed: 319.74 samples/sec    Train-Perplexity=27.374816
2017-01-23 07:36:56,057 Epoch[0] Batch [250]    Speed: 320.25 samples/sec    Train-Perplexity=24.731618
2017-01-23 07:37:01,096 Epoch[0] Batch [300]    Speed: 317.56 samples/sec    Train-Perplexity=23.069615
2017-01-23 07:37:06,104 Epoch[0] Batch [350]    Speed: 319.46 samples/sec    Train-Perplexity=25.119809
2017-01-23 07:37:11,156 Epoch[0] Batch [400]    Speed: 316.76 samples/sec    Train-Perplexity=23.873587
2017-01-23 07:37:16,182 Epoch[0] Batch [450]    Speed: 318.30 samples/sec    Train-Perplexity=22.034268
2017-01-23 07:37:21,158 Epoch[0] Batch [500]    Speed: 321.57 samples/sec    Train-Perplexity=21.762741
2017-01-23 07:37:26,070 Epoch[0] Batch [550]    Speed: 325.80 samples/sec    Train-Perplexity=20.518414
2017-01-23 07:37:31,077 Epoch[0] Batch [600]    Speed: 319.56 samples/sec    Train-Perplexity=22.382877
2017-01-23 07:37:36,062 Epoch[0] Batch [650]    Speed: 320.98 samples/sec    Train-Perplexity=20.621223
2017-01-23 07:37:41,014 Epoch[0] Batch [700]    Speed: 323.08 samples/sec    Train-Perplexity=21.058044
.
.
.
2017-01-23 07:44:21,580 Epoch[2] Batch [1300]    Speed: 321.41 samples/sec    Train-Perplexity=17.281973
2017-01-23 07:44:26,553 Epoch[2] Batch [1350]    Speed: 321.74 samples/sec    Train-Perplexity=14.715190
2017-01-23 07:44:31,533 Epoch[2] Batch [1400]    Speed: 321.29 samples/sec    Train-Perplexity=16.221104
2017-01-23 07:44:36,559 Epoch[2] Batch [1450]    Speed: 318.40 samples/sec    Train-Perplexity=15.390250
2017-01-23 07:44:41,632 Epoch[2] Batch [1500]    Speed: 315.39 samples/sec    Train-Perplexity=15.445390
2017-01-23 07:44:46,598 Epoch[2] Batch [1550]    Speed: 322.18 samples/sec    Train-Perplexity=14.912412
2017-01-23 07:44:51,602 Epoch[2] Batch [1600]    Speed: 319.79 samples/sec    Train-Perplexity=15.044475
2017-01-23 07:44:54,991 Epoch[2] Resetting Data Iterator
2017-01-23 07:44:54,991 Epoch[2] Time cost=162.703
2017-01-23 07:45:02,795 Epoch[2] Validation-Perplexity=15.626726

rnn_cell_demo_batch_major

2017-01-23 07:57:17,222 Epoch[0] Batch [50]    Speed: 556.27 samples/sec    Train-Perplexity=1077.814333
2017-01-23 07:57:20,054 Epoch[0] Batch [100]    Speed: 564.90 samples/sec    Train-Perplexity=64.062006
2017-01-23 07:57:22,903 Epoch[0] Batch [150]    Speed: 561.77 samples/sec    Train-Perplexity=47.671691
2017-01-23 07:57:25,758 Epoch[0] Batch [200]    Speed: 560.31 samples/sec    Train-Perplexity=42.398942
2017-01-23 07:57:28,610 Epoch[0] Batch [250]    Speed: 561.13 samples/sec    Train-Perplexity=44.818229
2017-01-23 07:57:31,400 Epoch[0] Batch [300]    Speed: 573.49 samples/sec    Train-Perplexity=41.456521
2017-01-23 07:57:34,191 Epoch[0] Batch [350]    Speed: 573.31 samples/sec    Train-Perplexity=43.779823
2017-01-23 07:57:36,985 Epoch[0] Batch [400]    Speed: 572.69 samples/sec    Train-Perplexity=39.637871
2017-01-23 07:57:39,769 Epoch[0] Batch [450]    Speed: 574.74 samples/sec    Train-Perplexity=38.659391
2017-01-23 07:57:42,539 Epoch[0] Batch [500]    Speed: 577.71 samples/sec    Train-Perplexity=42.503496
2017-01-23 07:57:45,333 Epoch[0] Batch [550]    Speed: 572.56 samples/sec    Train-Perplexity=40.609032
2017-01-23 07:57:48,093 Epoch[0] Batch [600]    Speed: 579.72 samples/sec    Train-Perplexity=41.916054
2017-01-23 07:57:50,799 Epoch[0] Batch [650]    Speed: 591.43 samples/sec    Train-Perplexity=39.585985
2017-01-23 07:57:53,520 Epoch[0] Batch [700]    Speed: 587.94 samples/sec    Train-Perplexity=35.613685
.
.
.
2017-01-23 08:01:40,015 Epoch[2] Batch [1300]    Speed: 577.24 samples/sec    Train-Perplexity=27.096243
2017-01-23 08:01:42,774 Epoch[2] Batch [1350]    Speed: 579.89 samples/sec    Train-Perplexity=29.384804
2017-01-23 08:01:45,585 Epoch[2] Batch [1400]    Speed: 569.14 samples/sec    Train-Perplexity=26.295231
2017-01-23 08:01:48,391 Epoch[2] Batch [1450]    Speed: 570.26 samples/sec    Train-Perplexity=25.926440
2017-01-23 08:01:51,185 Epoch[2] Batch [1500]    Speed: 572.68 samples/sec    Train-Perplexity=28.710357
2017-01-23 08:01:53,954 Epoch[2] Batch [1550]    Speed: 577.95 samples/sec    Train-Perplexity=26.068578
2017-01-23 08:01:56,813 Epoch[2] Batch [1600]    Speed: 559.57 samples/sec    Train-Perplexity=26.940867
2017-01-23 08:01:58,626 Epoch[2] Train-Perplexity=28.526618
2017-01-23 08:01:58,626 Epoch[2] Time cost=90.931
2017-01-23 08:02:04,401 Epoch[2] Validation-Perplexity=27.422306
Summary of dataset ==================
bucket of len  32 : 52318 samples
Summary of dataset ==================

rnn_cell_demo_time_major

2017-01-23 07:47:52,108 Epoch[0] Batch [50]    Speed: 543.10 samples/sec    Train-Perplexity=1093.169354
2017-01-23 07:47:55,007 Epoch[0] Batch [100]    Speed: 552.03 samples/sec    Train-Perplexity=51.530916
2017-01-23 07:47:58,004 Epoch[0] Batch [150]    Speed: 533.94 samples/sec    Train-Perplexity=58.214017
2017-01-23 07:48:00,944 Epoch[0] Batch [200]    Speed: 544.24 samples/sec    Train-Perplexity=50.646243
2017-01-23 07:48:03,919 Epoch[0] Batch [250]    Speed: 537.80 samples/sec    Train-Perplexity=47.026265
2017-01-23 07:48:06,870 Epoch[0] Batch [300]    Speed: 542.14 samples/sec    Train-Perplexity=45.660978
2017-01-23 07:48:09,855 Epoch[0] Batch [350]    Speed: 536.08 samples/sec    Train-Perplexity=42.589243
2017-01-23 07:48:12,809 Epoch[0] Batch [400]    Speed: 541.67 samples/sec    Train-Perplexity=41.940702
2017-01-23 07:48:15,803 Epoch[0] Batch [450]    Speed: 534.42 samples/sec    Train-Perplexity=42.167020
2017-01-23 07:48:18,713 Epoch[0] Batch [500]    Speed: 549.88 samples/sec    Train-Perplexity=39.090338
2017-01-23 07:48:21,552 Epoch[0] Batch [550]    Speed: 563.56 samples/sec    Train-Perplexity=40.942311
2017-01-23 07:48:24,505 Epoch[0] Batch [600]    Speed: 541.92 samples/sec    Train-Perplexity=35.940093
2017-01-23 07:48:27,455 Epoch[0] Batch [650]    Speed: 542.47 samples/sec    Train-Perplexity=36.640569
2017-01-23 07:48:30,409 Epoch[0] Batch [700]    Speed: 541.59 samples/sec    Train-Perplexity=40.224681
.
.
.
2017-01-23 07:52:30,523 Epoch[2] Batch [1300]    Speed: 532.77 samples/sec    Train-Perplexity=28.203658
2017-01-23 07:52:33,505 Epoch[2] Batch [1350]    Speed: 536.61 samples/sec    Train-Perplexity=28.522915
2017-01-23 07:52:36,469 Epoch[2] Batch [1400]    Speed: 539.91 samples/sec    Train-Perplexity=31.787062
2017-01-23 07:52:39,484 Epoch[2] Batch [1450]    Speed: 530.71 samples/sec    Train-Perplexity=27.522469
2017-01-23 07:52:42,486 Epoch[2] Batch [1500]    Speed: 532.92 samples/sec    Train-Perplexity=27.759309
2017-01-23 07:52:45,468 Epoch[2] Batch [1550]    Speed: 536.58 samples/sec    Train-Perplexity=24.959196
2017-01-23 07:52:48,478 Epoch[2] Batch [1600]    Speed: 531.64 samples/sec    Train-Perplexity=28.175112
2017-01-23 07:52:50,412 Epoch[2] Train-Perplexity=24.717883
2017-01-23 07:52:50,412 Epoch[2] Time cost=96.791
2017-01-23 07:52:56,148 Epoch[2] Validation-Perplexity=27.801519
Summary of dataset ==================
bucket of len  32 : 52318 samples
Summary of dataset ==================
bucket of len  32 : 4131 samples

The parameters are as follows at all implementation.

    buckets = [32]
    num_hidden = 200
    num_embed = 200
    num_lstm_layer = 2

    num_epoch = 3
    learning_rate = 0.01
    momentum = 0.0

What is the difference between these three implementations?

mz24cn commented 7 years ago

rnn_cell_demo.py named built in RNN parameters as "LSTM_bias", which is initialized as all zero. It significantly reduces the convergence speed. I changed the initializer and acquired similar convergence speed like others.

YujiOshima commented 7 years ago

Thank you @mz24cn ! Since I do not know how to initialize the bias to sym.RNN, can you show me if you have sample code to initialize the bias?

mz24cn commented 7 years ago

I will commit my code in next several days.

YujiOshima commented 7 years ago

Great! It's a big help. I am looking forward to your commit.

mz24cn commented 7 years ago

I have submitted a PR: https://github.com/dmlc/mxnet/pull/4819

phunterlau commented 7 years ago

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

apache / mxnet

Question about LSTM implementation: Perplexity convergence differs between lstm_bucketing.py and rnn_cell_demo.py #4774

Environment info

Question