geek-ai / Texygen

A text generation benchmarking platform
MIT License
863 stars 203 forks source link

Some questions about GSGAN #16

Closed luofuli closed 5 years ago

luofuli commented 6 years ago

Hello, I have a few questions about GSGAN.

  1. In your code, the inverse temperature parameter τ (self.tau in file: GsganGenerator.py) is kept to 10. However, in the original paper, the authors suggests that starting with some relatively large τ and then anealing it to zero during training.

  2. What's more, I also don't understand why you add gumbel distribution before calculating output logists.

        def _pretrain_recurrence(i, x_t, h_tm1, g_predictions):
            h_t = self.g_recurrent_unit(x_t, h_tm1)
            h_t = self.add_gumbel(h_t)  # add g_i?????
            o_t = self.g_output_unit(h_t)
            g_predictions = g_predictions.write(i, tf.nn.softmax(o_t))  # batch x vocab_size
            x_tp1 = tf.nn.softmax(o_t / self.tau)
            return i + 1, x_tp1, h_t, g_predictions

    Can you give me more illustration about this function?

  3. Finally, why you don't show the performance of GSGAN?

Yaoming95 commented 6 years ago

Sorry for my late reply. For 1, we will release the version with this parameter in the next version. For 2, in the paper, notation h is not explicit explained. In their figures, it denotes the hidden layer; while in the equations, we think it may denote the output layer (as in your question). Since there is only a linear transformation between these two,

    def create_output_unit(self, params):
        self.Wo = tf.Variable(self.init_matrix([self.hidden_dim, self.num_vocabulary]))
        self.bo = tf.Variable(self.init_matrix([self.num_vocabulary]))
        params.extend([self.Wo, self.bo])

        def unit(hidden_memory_tuple):
            hidden_state, c_prev = tf.unstack(hidden_memory_tuple)
            logits = tf.matmul(hidden_state, self.Wo) + self.bo
            return logits

    return unit

we think it won't have huge impacts on the final results. For 3, as we explained in our paper, this model cannot generate semantic sentences in the real data experiment. In the original paper, the authors also did not conduct experiments on natural languages.