ZiJianZhao / NMT-Research-Reproductions-PyTorch

Reproducing neural machine translation research results in PyTorch
7 stars 2 forks source link

should reimplement GRU if you want to reproduce RNNSearch paper #1

Open SkyAndCloud opened 6 years ago

SkyAndCloud commented 6 years ago

What is your implementation's performance on NIST Chinese-English dataset?. Hi. Recently I also want to reproduce a pytorch version RNNSearch. I find that your code directly use pytorch's GRU implementation, which is different from vanilla RNNSearch. The papers's author modify GRU's inner structure, so you need a RNNSearch-style GRU. I think your code's unsatisfied performance may due to this difference.

ZiJianZhao commented 6 years ago

Thanks for your comments. But I think my implementation is right. The RNNSearch model has two features: first, the model uses the hidden state of last time step to calculate the attention vector then forward it into the RNN; second, it uses MaxOut in the output layer. And my implementation is in: https://github.com/ZiJianZhao/NMT-Research-Reproductions-PyTorch/blob/master/xnmt/modules/old_decoder.py#L134 Can you speak more clearly about the modification of GRU's inner structure?

SkyAndCloud commented 6 years ago

emmm, I think I made a mistake. Your implementation should be correct while you concat context vector and target embedding vector into a vector, which acts as GRU's input as well as hidden_state. Apparently it's identical to vanilla groundhog where context vector and target embedding vector are individual input to GRU if you don't have layer normalization or other operations may influence the combined vector's weight.