Closed XiaoqingNLP closed 6 years ago
I think you probably changed the default parameters of transformer
architecture, which is quite sensitive to hyper parameters. All models of THUMT support multi-GPU setting and the transformer
architecture generally performs much better than rnnsearch
when using default parameters.
I have try to reproduce the experiment with model transformer and multi-GPU ,but I found a lots of line is none in the file *trans.norm of decoding .the experiment model rnnsearch with one-gpu hasn't this phenomenon. tensorflow/ master