Can't reproduce the accuracy with pre-trained Large Scale Word Language Model (gbw dataset)

ciyongch commented 4 years ago

Description

Can't reproduce the same accuracy with pre-trained Large Scale Word Language Model based on gbw dataset, following the guideline. The accuracy given in tutorial is: [1] LSTM-2048-512 (Test PPL 43.62)

What I got is: Best validation loss 9.40, val ppl 12130.84 Best test loss 9.42, test ppl 12303.21

@eric-haibin-lin

Error Message

BigRNN(
  (embedding): HybridSequential(
    (0): Embedding(793471 -> 512, float32)
    (1): Dropout(p = 0.1, axes=())
  )
  (encoder): HybridSequentialRNNCell(
  (0): LSTMPCell(512 -> 8192 -> 512)
  (1): DropoutCell(rate=0.1, axes=())
  )
  (decoder): Dense(512 -> 793471, linear)
)
Vocab(size=793471, unk="<unk>", reserved="['<pad>', '<eos>']")
Best validation loss 9.40, val ppl 12130.84
Best test loss 9.42, test ppl 12303.21

To Reproduce

Run the script awd_lstm.py with MXNet master branch.

Steps to reproduce

python awd_lstm.py

What have you tried to solve it?

None

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here
----------Python Info----------
Version      : 3.7.5
Compiler     : GCC 7.3.0
Build        : ('default', 'Oct 25 2019 15:51:11')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ciyong/miniconda3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 2.0.0
Directory    : /tmp/mxnet/incubator-mxnet/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-3.10.0-957.el7.x86_64-x86_64-with-centos-7.6.1810-Core
system       : Linux
node         : mlt2-clx094
release      : 3.10.0-957.el7.x86_64
version      : #1 SMP Thu Nov 8 23:39:32 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                112
On-line CPU(s) list:   0-111
Thread(s) per core:    2
Core(s) per socket:    28
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
Stepping:              7
CPU MHz:               3310.070
CPU max MHz:           4000.0000
CPU min MHz:           1000.0000
BogoMIPS:              5400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              39424K
NUMA node0 CPU(s):     0-27,56-83
NUMA node1 CPU(s):     28-55,84-111
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni spec_ctrl intel_stibp flush_l1d arch_capabilities
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0103 sec, LOAD: 2.1315 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0013 sec, LOAD: 1.9952 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0066 sec, LOAD: 2.3918 sec.
Timing for D2L: http://d2l.ai, DNS: 0.1313 sec, LOAD: 0.5946 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0058 sec, LOAD: 0.7973 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 5.7738 sec, LOAD: 2.1621 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0015 sec, LOAD: 11.5346 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.010091543197631836 sec.

eric-haibin-lin commented 4 years ago

@ciyongch have you tried mxnet 1.6?

ciyongch commented 4 years ago

@ciyongch have you tried mxnet 1.6?

Tried with both mxnet 1.6.0 and mxnet-cu90 1.6.0, got same/similar accuracy as my previous results. It's probably due to the incorrect pre-trained params download from S3 server, can you help to check that?

1) mxnet 1.6.0

----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ciyong/miniconda3/envs/mx_release/lib/python3.6/site-packages/mxnet
Num GPUs     : 0
Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752

1.1) output

BigRNN(
  (embedding): HybridSequential(
    (0): Embedding(793471 -> 512, float32)
    (1): Dropout(p = 0.1, axes=())
  )
  (encoder): HybridSequentialRNNCell(
  (0): LSTMPCell(512 -> 8192 -> 512)
  (1): DropoutCell(rate=0.1, axes=())
  )
  (decoder): Dense(512 -> 793471, linear)
)
Vocab(size=793471, unk="<unk>", reserved="['<pad>', '<eos>']")
Best validation loss 9.40, val ppl 12130.84
Best test loss 9.42, test ppl 12303.21

2) mxnet-cu90 1.6.0

----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ciyong/miniconda3/envs/mxnet_gpu/lib/python3.6/site-packages/mxnet
Num GPUs     : 1
Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752

2.1) output

BigRNN(
  (embedding): HybridSequential(
    (0): Embedding(793471 -> 512, float32)
    (1): Dropout(p = 0.1, axes=())
  )
  (encoder): HybridSequentialRNNCell(
  (0): LSTMPCell(512 -> 8192 -> 512)
  (1): DropoutCell(rate=0.1, axes=())
  )
  (decoder): Dense(512 -> 793471, linear)
)
Vocab(size=793471, unk="<unk>", reserved="['<pad>', '<eos>']")
Best validation loss 9.41, val ppl 12194.26
Best test loss 9.42, test ppl 12366.94

eric-haibin-lin commented 4 years ago

@ciyongch any update after using the original vocab from the pre-trained model?

ciyongch commented 4 years ago

@eric-haibin-lin After switching to GBW vocab, I got the better result as below, but there's run-to-run variance, not sure if it's related to the data loader.

(Gluon) Best validation loss 3.92, val ppl 50.46
(Gluon) Best test loss 3.92, test ppl 50.30

szha commented 4 years ago

Given that most of the recent demand is on transformer based models, we probably won't be able to get to this soon. @ciyongch let me know your use case if you still need this.

ciyongch commented 4 years ago

Hi @szha, it's ok to close it now, sorry for forgetting close it.

dmlc / gluon-nlp