Closed mmcs-work closed 4 years ago
It's possible that 500 steps are enough for it to learn how to output the EOS token. Please train for longer and see if the issue goes away. If not, please reopen the issue.
It's possible that 500 steps are enough for it to learn how to output the EOS token. Please train for longer and see if the issue goes away. If not, please reopen the issue.
I have checked this after you mentioned this point. This time I have trained it for 5000 steps. But the result is the same. Also, two things that don't fit well with this behavior:-
The loss curve shows a gradual decrease along with very minimal value even in the initial steps of training. (Maybe the way the data is handled during train and eval is different in the context of loss calculation)
With 500 finetune steps if I provide the model with default SPM vocabulary (do not specify output_features
), I get the correct result.
Based on point-2 I have checked the sentencepiece model training portion(followed the documentation steps there). I couldn't find any problem from that perspective. (@adarob I guess we can reopen the issue once again)
Thanks for checking. @nshazeer do you have any thoughts on this?
The problem resides on the default special tokens that are generated by the SPM which is not identical to the T5 default special tokens.
You need to change the spm training parameters, something like: spm.SentencePieceTrainer.Train('--input=./vocab.txt --model_prefix=./model --pad_id=0 --eos_id=1 --unk_id=2 --bos_id=-1 --model_type=word --hard_vocab_limit=false')
I think it is better to mention that somewhere on the readme files.
Ah yes, that's right. I'd checked the SentencePiece defaults when this issue was opened, but I must have misread. The only ones you need to set are eos_id=1 and pad_id=0. I will update the README.
With these settings, it worked. Thanks.
Leaving open until #250 is in.
After training T5 on imdb dataset from scratch, when the model us run against test set, it predicts the correct output as well as some other tokens are outputted as part of target. As a result, the accuracy in this step is 0.
Now I have tried to use a custom Sentencepiece model from the IMDB vocabulary. I have identified that the problem arises only when the custom sentencepiece model is used. When I use the default SPM model then it works correctly.
I have tried to check the trained SPM model. I haven't noticed anything that can be attributed to this zero accuracy case.
As we can see the
positive
prediction along with??
. Am I missing any configuration or am I passing the custom vocabulary path wrong? (But this is most likely not the case as just changing the SPM model gives correct result).Here is the notebook link.