Open SkyAndCloud opened 6 years ago
Thanks for your comments. But I think my implementation is right. The RNNSearch model has two features: first, the model uses the hidden state of last time step to calculate the attention vector then forward it into the RNN; second, it uses MaxOut in the output layer. And my implementation is in: https://github.com/ZiJianZhao/NMT-Research-Reproductions-PyTorch/blob/master/xnmt/modules/old_decoder.py#L134 Can you speak more clearly about the modification of GRU's inner structure?
emmm, I think I made a mistake. Your implementation should be correct while you concat context vector and target embedding vector into a vector, which acts as GRU's input as well as hidden_state. Apparently it's identical to vanilla groundhog where context vector and target embedding vector are individual input to GRU if you don't have layer normalization or other operations may influence the combined vector's weight.
What is your implementation's performance on NIST Chinese-English dataset?. Hi. Recently I also want to reproduce a pytorch version RNNSearch. I find that your code directly use pytorch's GRU implementation, which is different from vanilla RNNSearch. The papers's author modify GRU's inner structure, so you need a RNNSearch-style GRU. I think your code's unsatisfied performance may due to this difference.