Closed ciyongch closed 4 years ago
@ciyongch have you tried mxnet 1.6?
@ciyongch have you tried mxnet 1.6?
Tried with both mxnet 1.6.0 and mxnet-cu90 1.6.0, got same/similar accuracy as my previous results. It's probably due to the incorrect pre-trained params download from S3 server, can you help to check that?
1) mxnet 1.6.0
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ciyong/miniconda3/envs/mx_release/lib/python3.6/site-packages/mxnet
Num GPUs : 0
Commit Hash : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
1.1) output
BigRNN(
(embedding): HybridSequential(
(0): Embedding(793471 -> 512, float32)
(1): Dropout(p = 0.1, axes=())
)
(encoder): HybridSequentialRNNCell(
(0): LSTMPCell(512 -> 8192 -> 512)
(1): DropoutCell(rate=0.1, axes=())
)
(decoder): Dense(512 -> 793471, linear)
)
Vocab(size=793471, unk="<unk>", reserved="['<pad>', '<eos>']")
Best validation loss 9.40, val ppl 12130.84
Best test loss 9.42, test ppl 12303.21
2) mxnet-cu90 1.6.0
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ciyong/miniconda3/envs/mxnet_gpu/lib/python3.6/site-packages/mxnet
Num GPUs : 1
Commit Hash : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
2.1) output
BigRNN(
(embedding): HybridSequential(
(0): Embedding(793471 -> 512, float32)
(1): Dropout(p = 0.1, axes=())
)
(encoder): HybridSequentialRNNCell(
(0): LSTMPCell(512 -> 8192 -> 512)
(1): DropoutCell(rate=0.1, axes=())
)
(decoder): Dense(512 -> 793471, linear)
)
Vocab(size=793471, unk="<unk>", reserved="['<pad>', '<eos>']")
Best validation loss 9.41, val ppl 12194.26
Best test loss 9.42, test ppl 12366.94
@ciyongch any update after using the original vocab from the pre-trained model?
@eric-haibin-lin After switching to GBW vocab, I got the better result as below, but there's run-to-run variance, not sure if it's related to the data loader.
(Gluon) Best validation loss 3.92, val ppl 50.46
(Gluon) Best test loss 3.92, test ppl 50.30
Given that most of the recent demand is on transformer based models, we probably won't be able to get to this soon. @ciyongch let me know your use case if you still need this.
Hi @szha, it's ok to close it now, sorry for forgetting close it.
Description
Can't reproduce the same accuracy with pre-trained Large Scale Word Language Model based on gbw dataset, following the guideline. The accuracy given in tutorial is:
[1] LSTM-2048-512 (Test PPL 43.62)
What I got is:
Best validation loss 9.40, val ppl 12130.84
Best test loss 9.42, test ppl 12303.21
@eric-haibin-lin
Error Message
To Reproduce
Run the script awd_lstm.py with MXNet master branch.
Steps to reproduce
python awd_lstm.py
What have you tried to solve it?
None
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below: