Open eric-haibin-lin opened 5 years ago
Running assert 1 == src_vocab.token_to_idx[src_vocab.idx_to_token[1]]
you will get an assertionerror. The loaded vocabulary does not exhibit the properties of a gluonnlp.Vocab
.
That's why the warning is printed.
@szhengac shall we change the downloaded vocab?
I remember that the vocab in wmt14en-de does not specify the eos and pad tokens.
Changing the downloaded vocab will change the indices mapping, as there is currently one invalid token at the beginning of the idx_to_token. So the embedding weights in the model files also need an update.
Description
Running the default command and script for transformer training results in a warning of corrupted index, which is misleading for users (whether the script still works) and should be fixed.
Error Message
To Reproduce
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
1. 2.
Environment
gluonnlp commit = 76ca4d7