geek-ai / Texygen

A text generation benchmarking platform
MIT License
863 stars 203 forks source link

Question about Synthetic Data Experiment #10

Closed optimass closed 6 years ago

optimass commented 6 years ago

Hi guys. Thanks for the great repo!

I looking at Section 4.2 Synthetic Data Experiment of the Texygen paper. I have 2 questions.

1) Are all the models trained on the exact same Oracle, or is is it re-initialized at every run. ?

2) Did you do a hyperparameter search for all the models (and presented the best run for each), or did you run one run for each model (with the default parameters in the repo)?

Thanks a lot!

Yaoming95 commented 6 years ago

For question 1, we use the same settings in Oracle. We use the same seed to ensure they share the same parameters. As the code denotes: ` self.Wi = tf.Variable(tf.random_normal([self.emb_dim, self.hidden_dim], 0.0, 1000000.0, seed=111)) self.Ui = tf.Variable(tf.random_normal([self.hidden_dim, self.hidden_dim], 0.0, 1000000.0, seed=211)) self.bi = tf.Variable(tf.random_normal([self.hidden_dim, ], 0.0, 1000000.0, seed=311))

    self.Wf = tf.Variable(tf.random_normal([self.emb_dim, self.hidden_dim], 0.0, 1000000.0, seed=114))
    self.Uf = tf.Variable(tf.random_normal([self.hidden_dim, self.hidden_dim], 0.0, 1000000.0, seed=115))
    self.bf = tf.Variable(tf.random_normal([self.hidden_dim, ], 0.0, 1000000.0, seed=116))

    self.Wog = tf.Variable(tf.random_normal([self.emb_dim, self.hidden_dim], 0.0, 1000000.0, seed=997))
    self.Uog = tf.Variable(tf.random_normal([self.hidden_dim, self.hidden_dim], 0.0, 1000000.0, seed=998))
    self.bog = tf.Variable(tf.random_normal([self.hidden_dim, ], 0.0, 1000000.0, seed=999))

    self.Wc = tf.Variable(tf.random_normal([self.emb_dim, self.hidden_dim], 0.0, 1000000.0, seed=110))
    self.Uc = tf.Variable(tf.random_normal([self.hidden_dim, self.hidden_dim], 0.0, 1000000.0, seed=111))
    self.bc = tf.Variable(tf.random_normal([self.hidden_dim, ], 0.0, 1000000.0, seed=112))`

The only difference in every run is that maybe they do not generate exactly the same training data.

For question 2, for some parameters, we use the one provided by the author (for example, SeqGAN). For some hyperparameter which is shared by different models, we use the parameters in the repo, for example, hyperparameter of CNN as the discriminator. This is because we want to provide a fair comparison environment.

optimass commented 6 years ago

Ok this is really helpful thank you!