Eval predicting wrong every time (extra symbols along with correct prediciton)

mmcs-work commented 4 years ago

After training T5 on imdb dataset from scratch, when the model us run against test set, it predicts the correct output as well as some other tokens are outputted as part of target. As a result, the accuracy in this step is 0.

Now I have tried to use a custom Sentencepiece model from the IMDB vocabulary. I have identified that the problem arises only when the custom sentencepiece model is used. When I use the default SPM model then it works correctly.

I have tried to check the trained SPM model. I haven't noticed anything that can be attributed to this zero accuracy case.

INFO:tensorflow:decoded 1024: sentiment: "good (not great) little horror film with a high ""creep"" factor (not to be confused with a 1991 movie by the same name, or the more recent (2001) campfire stories). central tale of stranded teens telling ghost stories around a campfire in spooky woods nicely leads into, and ties together the different stories that make up the bulk of the movie (watch for ron livingston (office space, band of brothers) and jennifer macdonald in a spirited, sexy segment (""the honeymoon"")). solid acting and a few truly ""scary"" moments make this an above-average chiller. good example of interesting story
INFO:tensorflow:            -> positive ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:decoded 2048: sentiment: "at the time of writing this review it would seem that over 50 ⁇  of imdb voters had given this film a rating of either a 10 or a 1. i can only surmise then that those giving it a 10 were either cast or crew members. they say that given enough monkeys and enough time and enough typewriters, those monkeys, just by random proddings at the keyboard, would eventually type out the complete works of shakespeare. however, i seriously doubt that given the same number of monkeys and time, you could find a single one to give this movie a rating of 10.i patiently watched the first half, foolishly assuming that the
INFO:tensorflow:            -> negative ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:decoded 4096: sentiment: "had this been the original 1914 version of tess of the storm country (also starring mary pickford), i probably would have rated it a lot higher, as this sort of extreme melodrama and sentimentality was pretty typical of the teens. however, by 1922, this film was already starting to show its age. and, compared to many of ms. pickfords other films (such as daddy longlegs, sparrows, my best girl and suds), tess comes up a tad short--and not every pickford film merits a 10 (even if she was ""america's sweetheart""). now
INFO:tensorflow:            -> positive ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:decoded 8192: sentiment: plot: ed and alice are engaged. they live together and are living the dull life. he has slept around before meeting alice. she has a lot less experience. she decides she needs to sleep around before marrying. he very reluctantly agrees they should both see other people for a while. at first he is not really into it. his wild days are behind him and he is simply content. until one day alice comes back and tells him she made out with some random guy; who of course starts to fall for her. of course this is a bad idea which causes extreme strain on the relationship.good movie
INFO:tensorflow:            -> positive ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ 
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Enqueue next (1) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (1) batch(es) of data from outfeed.
INFO:tensorflow:Stop infeed thread controller
INFO:tensorflow:Shutting down InfeedController thread.
INFO:tensorflow:InfeedController received shutdown signal, stopping.
INFO:tensorflow:Infeed thread finished, shutting down.
INFO:tensorflow:infeed marked as finished
INFO:tensorflow:Stop output thread controller
INFO:tensorflow:Shutting down OutfeedController thread.
INFO:tensorflow:OutfeedController received shutdown signal, stopping.
INFO:tensorflow:Outfeed thread finished, shutting down.
INFO:tensorflow:outfeed marked as finished
INFO:tensorflow:Shutdown TPU system.
INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:eval/imdb_custom1/accuracy at step 500: 0.000
INFO:tensorflow:eval/imdb_custom2/accuracy at step 500: 0.000

As we can see the positive prediction along with ??. Am I missing any configuration or am I passing the custom vocabulary path wrong? (But this is most likely not the case as just changing the SPM model gives correct result).

Here is the notebook link.

Dataset link
In this case I have uploaded the zip file when colab runtime was connected. And then worked on it and copied it back to GCS cloud storage.

adarob commented 4 years ago

It's possible that 500 steps are enough for it to learn how to output the EOS token. Please train for longer and see if the issue goes away. If not, please reopen the issue.

mmcs-work commented 4 years ago

It's possible that 500 steps are enough for it to learn how to output the EOS token. Please train for longer and see if the issue goes away. If not, please reopen the issue.

I have checked this after you mentioned this point. This time I have trained it for 5000 steps. But the result is the same. Also, two things that don't fit well with this behavior:-

The loss curve shows a gradual decrease along with very minimal value even in the initial steps of training. (Maybe the way the data is handled during train and eval is different in the context of loss calculation)
With 500 finetune steps if I provide the model with default SPM vocabulary (do not specify output_features), I get the correct result.

Based on point-2 I have checked the sentencepiece model training portion(followed the documentation steps there). I couldn't find any problem from that perspective. (@adarob I guess we can reopen the issue once again)

adarob commented 4 years ago

Thanks for checking. @nshazeer do you have any thoughts on this?

agemagician commented 4 years ago

The problem resides on the default special tokens that are generated by the SPM which is not identical to the T5 default special tokens.

You need to change the spm training parameters, something like: spm.SentencePieceTrainer.Train('--input=./vocab.txt --model_prefix=./model --pad_id=0 --eos_id=1 --unk_id=2 --bos_id=-1 --model_type=word --hard_vocab_limit=false')

I think it is better to mention that somewhere on the readme files.

adarob commented 4 years ago

Ah yes, that's right. I'd checked the SentencePiece defaults when this issue was opened, but I must have misread. The only ones you need to set are eos_id=1 and pad_id=0. I will update the README.

mmcs-work commented 4 years ago

With these settings, it worked. Thanks.

adarob commented 4 years ago

Leaving open until #250 is in.

google-research / text-to-text-transfer-transformer

Eval predicting wrong every time (extra symbols along with correct prediciton) #246