CR-Gjx / LeakGAN

The codes of paper "Long Text Generation via Adversarial Training with Leaked Information" on AAAI 2018. Text generation using GAN and Hierarchical Reinforcement Learning.
https://arxiv.org/abs/1709.08624
576 stars 180 forks source link

Why CNN instead of LSTM? #7

Closed AranKomat closed 6 years ago

AranKomat commented 6 years ago

There are some recent text generation GAN papers in which the discriminator is a LSTM rather than a CNN. Why was CNN used in this paper?

CR-Gjx commented 6 years ago
  1. This work is inherited from the SeqGAN (http://www.aaai.org/Conferences/AAAI/2017/PreliminaryPapers/12-Yu-L-14344.pdf), this work use CNN as a discriminator. Other good works like RankGAN , (Adversarial Feature Matching for Text Generation ) are also use a CNN as a discriminator.
  2. CNN is a choice for a discriminator. In fact, we also try using LSTM as a discriminator, it also has a good performance, but limited by speed.
AranKomat commented 6 years ago

Thanks for your clarification and in particular telling my that LSTM worked not significantly better than CNN.

CR-Gjx commented 6 years ago

Sorry, I mean that LSTM limited by the speed because of its bad parallelization. While the performance of LSTM depends on the specific task.

AranKomat commented 6 years ago

I see. It'd be interesting to try with Transformer. Sorry but I have three questions remaining.

  1. In SeqGAN issue page, LantaoYu theoretically justified training G once for each 15 steps of D updates. But did you guys empirically verify that the 1:15 ratio was good?

  2. In the adversarial training, it was trained for 800 batches only. Is this small number due to fast convergence or considerable amount of training time?

  3. Does the discriminator of LeakGAN receive unfinished real sentences as well, to discriminate them from unfinished fake sentences? If all the discriminator receives are real finished sentences and fake finished and unfinished sentences, then it's trivial to discriminate them by judging the one with comma at the end to be real one. But in this way, the discriminator cannot learn much. How is this avoided in your algorithm?