BLEU scores suddenly drops while training interpolation

perprit commented 5 years ago

Hi, thanks for the great work. I've tried training an NMT model on IWSLT 14 with interpolation algorithm, (https://github.com/asyml/texar/tree/master/examples/seq2seq_exposure_bias) but while training I found that the BLEU suddenly dropped to 0.0000 at about 11 epoch.

Below is the training log I ran into:

training epoch=9, lambdas=[0.04, 0.06, 0.0]
step=0, loss=48.1200, lambdas=[0.04, 0.06, 0.0]
step=500, loss=50.3024, lambdas=[0.04, 0.06, 0.0]
step=1000, loss=49.0209, lambdas=[0.04, 0.06, 0.0]
step=1500, loss=44.0876, lambdas=[0.04, 0.06, 0.0]
step=2000, loss=54.4154, lambdas=[0.04, 0.06, 0.0]
step=2500, loss=53.7328, lambdas=[0.04, 0.06, 0.0]
step=3000, loss=54.9698, lambdas=[0.04, 0.06, 0.0]
step=3500, loss=67.9883, lambdas=[0.04, 0.06, 0.0]
step=4000, loss=51.2655, lambdas=[0.04, 0.06, 0.0]
step=4500, loss=56.5977, lambdas=[0.04, 0.06, 0.0]
val epoch=9, BLEU=27.1300; best-ever=27.1300
test epoch=9, BLEU=25.2700
==================================================
training epoch=10, lambdas=[0.04, 0.06, 0.0]
step=0, loss=60.8326, lambdas=[0.04, 0.06, 0.0]
step=500, loss=39.8571, lambdas=[0.04, 0.06, 0.0]
step=1000, loss=52.8363, lambdas=[0.04, 0.06, 0.0]
step=1500, loss=47.0654, lambdas=[0.04, 0.06, 0.0]
step=2000, loss=62.2711, lambdas=[0.04, 0.06, 0.0]
step=2500, loss=64.2932, lambdas=[0.04, 0.06, 0.0]
step=3000, loss=49.2814, lambdas=[0.04, 0.06, 0.0]
step=3500, loss=53.3860, lambdas=[0.04, 0.06, 0.0]
step=4000, loss=52.4406, lambdas=[0.04, 0.06, 0.0]
step=4500, loss=53.0982, lambdas=[0.04, 0.06, 0.0]
val epoch=10, BLEU=27.0600; best-ever=27.1300
test epoch=10, BLEU=25.3000
==================================================
training epoch=11, lambdas=[0.1, 0.0, 0.0]
step=0, loss=43.5935, lambdas=[0.1, 0.0, 0.0]
step=500, loss=6.5808, lambdas=[0.1, 0.0, 0.0]
step=1000, loss=3.1541, lambdas=[0.1, 0.0, 0.0]
step=1500, loss=2.2091, lambdas=[0.1, 0.0, 0.0]
step=2000, loss=2.9512, lambdas=[0.1, 0.0, 0.0]
step=2500, loss=1.2280, lambdas=[0.1, 0.0, 0.0]
step=3000, loss=1.1169, lambdas=[0.1, 0.0, 0.0]
step=3500, loss=1.3231, lambdas=[0.1, 0.0, 0.0]
step=4000, loss=1.2344, lambdas=[0.1, 0.0, 0.0]
step=4500, loss=1.1418, lambdas=[0.1, 0.0, 0.0]
val epoch=11, BLEU=0.0000; best-ever=27.1300
test epoch=11, BLEU=0.0000  // <-- BLEU suddenly dropped!
==================================================
training epoch=12, lambdas=[0.1, 0.0, 0.0]
step=0, loss=1.7246, lambdas=[0.1, 0.0, 0.0]
step=500, loss=1.3470, lambdas=[0.1, 0.0, 0.0]
step=1000, loss=1.0208, lambdas=[0.1, 0.0, 0.0]
step=1500, loss=1.6566, lambdas=[0.1, 0.0, 0.0]
step=2000, loss=1.4075, lambdas=[0.1, 0.0, 0.0]
step=2500, loss=1.5193, lambdas=[0.1, 0.0, 0.0]
step=3000, loss=1.1760, lambdas=[0.1, 0.0, 0.0]
step=3500, loss=0.8260, lambdas=[0.1, 0.0, 0.0]
step=4000, loss=2.0769, lambdas=[0.1, 0.0, 0.0]
step=4500, loss=1.1434, lambdas=[0.1, 0.0, 0.0]
val epoch=12, BLEU=0.0000; best-ever=27.1300
test epoch=12, BLEU=0.0000

And the test_results10.txt is like:

you know , one of the intense pleasures of travel and one of the delights of ethnographic research is the opportunity to live amongst those who have not forgotten the old ways , who still feel their past in the wind , touch it in stones polished by rain , taste it in the bitter leaves of plants . ||| you know , one of the great <UNK> travel in travel , and one of the pleasure of the <UNK> research is to live with the people who remember remember the old days , they can feel their past , they <UNK> the the <UNK> of the plants .
just to know that jaguar shamans still journey beyond the milky way , or the myths of the inuit elders still resonate with meaning , or that in the himalaya , the buddhists still pursue the breath of the dharma , is to really remember the central revelation of anthropology , and that is the idea that the world in which we live does not exist in some absolute sense , but is just one model of reality , the consequence of one particular set of adaptive choices that our lineage made , albeit successfully , many generations ago . ||| just the know that <UNK> still still beyond the milky way , or the importance of the council of the inuit , is full of the the the the the the the the world , which is the the world that the world that we &apos;re in ,
and of course , we all share the same adaptive imperatives . ||| and of course , we all share the same <UNK> .
we &apos;re all born . we all bring our children into the world . ||| we &apos;re all born . we &apos;re bringing kids to the world .
we go through initiation rites . ||| we go through <UNK> .

And the test_results11.txt (when the BLEU dropped) is like:

you know , one of the intense pleasures of travel and one of the delights of ethnographic research is the opportunity to live amongst those who have not forgotten the old ways , who still feel their past in the wind , touch it in stones polished by rain , taste it in the bitter leaves of plants . ||| you
just to know that jaguar shamans still journey beyond the milky way , or the myths of the inuit elders still resonate with meaning , or that in the himalaya , the buddhists still pursue the breath of the dharma , is to really remember the central revelation of anthropology , and that is the idea that the world in which we live does not exist in some absolute sense , but is just one model of reality , the consequence of one particular set of adaptive choices that our lineage made , albeit successfully , many generations ago . ||| just
and of course , we all share the same adaptive imperatives . ||| and
we &apos;re all born . we all bring our children into the world . ||| we
we go through initiation rites . ||| we

I guess it's something to do with the lambda value that changed, but I have no idea right now. I've only modified configs to set batch_size as 32 (from 64), and using python v3.5.2 with tensorflow-gpu v1.8.0. Could you guess any reason why? Thanks.

tanyuqian commented 5 years ago

Your lambdas of epoch 11 is [0.1, 0.0, 0.0], in that case, your generated sequence totally depends on your model during training, so it does make sense that your model will collapse.

Our initial lambdas is [0.04, 0.96, 0.0] (here). Your setting of lambdas is different from ours.

perprit commented 5 years ago

Hi, thanks for the comment. I set the lambda as [0.04, 0.06, 0.0] as the README says. Sorry for that I didn't understand what the initial lambda values mean when I first ran this code, which makes me not notice a trivial error like this.. I think the README needs to be fixed anyway.

tanyuqian commented 5 years ago

Oh..I'm sorry for that. It's my fault. I will fix the typo soon.

Thank you very much for pointing this out.

asyml / texar

BLEU scores suddenly drops while training interpolation #71