I use the same methods in the code, but change to tensorflow.
And I dont add noise in the input embedding, I got huge drop in the bleu scores. Actually, it seems the model doesnt tent to converge.
Is there any difference about the adv training between lm and translation?
Any help is appreciate! Thx!
I use the same methods in the code, but change to tensorflow. And I dont add noise in the input embedding, I got huge drop in the bleu scores. Actually, it seems the model doesnt tent to converge. Is there any difference about the adv training between lm and translation? Any help is appreciate! Thx!