HongyuGong / TextStyleTransfer

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
26 stars 9 forks source link

Questions about the paper. #1

Open ehsan-soe opened 4 years ago

ehsan-soe commented 4 years ago

Hi @HongyuGong Thanks for the very nice work. I have a few questions regarding the paper, which I am confused about. I really hope you can help me with that:

  1. In the Generator Pre-training section, I wonder what are the exact input and out of you encoder-decoder? It is stated that the generator is trained in a supervised manner... . Does that mean you used parallel data for pretraining the generator? If not, what are the input and outputs?
  2. I have the same question for RL setup, to compute the P(y|s) in Eq. 8 or 11, you must have learned the model trained with MLE, right? For training with MLE loss are you using parallel data?
  3. And finally, I wonder if you trained separate model for each type and direction (resulting in 4 models) or if not how you inform the model if it is pos-to-neg or neg-to-pos?

I really appreciate your comments on this,

HongyuGong commented 4 years ago

Hi @ehsan-soe,

Thanks for your interest in our work. Here are some explanations for your questions:

  1. In the pre-training stage, we do not use parallel data but only target corpus. A target sentence is fed into encoder as input, and we use the same sentence as the expected output from the decoder. This is like language model, translating a sentence to itself. The goal of pre-training is to get a good set of initial parameters to start with.

  2. Again we do not use parallel data but only source corpus for RL. The model is trained with policy gradient not MLE at the reinforcement learning stage. The token probability P(y|s) can be computed given the encoder-decoder regardless of the parallel data. At each step, we fed a source sentence to the encoder, and a sentence and its probability P(y|s) will be generated by the decoder. The generated sentence is sent to the evaluators to collect the feedbacks Q. Given P(y|s) and Q, the model update its parameters using Eq.(11).

  3. The current set-up is to train separate model for each type and direction respectively.

Feel free to let me know if you have any other questions.