Some questions about GSGAN

geek-ai / Texygen

A text generation benchmarking platform

MIT License

863 stars 203 forks source link

Hello, I have a few questions about GSGAN.

In your code, the inverse temperature parameter τ (self.tau in file: GsganGenerator.py) is kept to 10. However, in the original paper, the authors suggests that starting with some relatively large τ and then anealing it to zero during training.

What's more, I also don't understand why you add gumbel distribution before calculating output logists.

    def _pretrain_recurrence(i, x_t, h_tm1, g_predictions):
        h_t = self.g_recurrent_unit(x_t, h_tm1)
        h_t = self.add_gumbel(h_t)  # add g_i?????
        o_t = self.g_output_unit(h_t)
        g_predictions = g_predictions.write(i, tf.nn.softmax(o_t))  # batch x vocab_size
        x_tp1 = tf.nn.softmax(o_t / self.tau)
        return i + 1, x_tp1, h_t, g_predictions

Can you give me more illustration about this function?

Finally, why you don't show the performance of GSGAN?

def create_output_unit(self, params): self.Wo = tf.Variable(self.init_matrix([self.hidden_dim, self.num_vocabulary])) self.bo = tf.Variable(self.init_matrix([self.num_vocabulary])) params.extend([self.Wo, self.bo]) def unit(hidden_memory_tuple): hidden_state, c_prev = tf.unstack(hidden_memory_tuple) logits = tf.matmul(hidden_state, self.Wo) + self.bo return logits return unit

geek-ai / Texygen

Some questions about GSGAN #16