Open ehsan-soe opened 4 years ago
Hi @ehsan-soe,
Thanks for your interest in our work. Here are some explanations for your questions:
In the pre-training stage, we do not use parallel data but only target corpus. A target sentence is fed into encoder as input, and we use the same sentence as the expected output from the decoder. This is like language model, translating a sentence to itself. The goal of pre-training is to get a good set of initial parameters to start with.
Again we do not use parallel data but only source corpus for RL. The model is trained with policy gradient not MLE at the reinforcement learning stage. The token probability P(y|s) can be computed given the encoder-decoder regardless of the parallel data. At each step, we fed a source sentence to the encoder, and a sentence and its probability P(y|s) will be generated by the decoder. The generated sentence is sent to the evaluators to collect the feedbacks Q. Given P(y|s) and Q, the model update its parameters using Eq.(11).
The current set-up is to train separate model for each type and direction respectively.
Feel free to let me know if you have any other questions.
Hi @HongyuGong Thanks for the very nice work. I have a few questions regarding the paper, which I am confused about. I really hope you can help me with that:
I really appreciate your comments on this,