Open GabrielLin opened 4 years ago
Thank you for pointing that out. This is useful for running the model for more tuning, but for training from scratch replace this with initialize all variables:
sess = tf.Session(config = config) sess.run(tf.global_variables_initializer())
A quick clarification though, if you do initialize the variables, you will loose the pre-training from the discourse marker (DM) prediction task, and the model will only be trained on the essay data. The pre-trained model that is loaded in this script is only trained on the DM prediction task, not on any essay scoring task.
Thanks. Your reply is very detailed and clear. How can I train this discourse marker (DM) model myself?
For the DM prediction task, we used the book corpus from https://www.smashwords.com/ Due to the nature of the copyright, we cannot share the data, however you can download it and create your own data set. The details for the discourse markers we used and how they were categorized are in our paper.
Could you please share your training code of DM prediction task? I would like to create my own data set and try.
The layers and loss functions are in the code:
... num_classes_s = 8 ....
attention_output_sentorder = tf.reshape(attention_output, [-1,HIDDEN_SIZE223])
Ws1 = tf.Variable(tf.truncated_normal([HIDDEN_SIZE223, LAYER_1], stddev=0.1))
bs1 = tf.Variable(tf.truncated_normal([LAYER_1]))
y_hats1 = tf.nn.xw_plus_b(attention_output_sentorder, Ws1, bs1)
W_s2 = tf.Variable(tf.truncated_normal([LAYER_1, LAYER_2], stddev=0.1))
b_s2 = tf.Variable(tf.truncated_normal([LAYER_2]))
y_hat_s2 = tf.nn.xw_plus_b(y_hats1, W_s2, b_s2)
W_s = tf.Variable(tf.truncated_normal([LAYER_2, num_classes_s], stddev=0.1))
b_s = tf.Variable(tf.truncated_normal([num_classes_s]))
y_hat_s = tf.nn.xw_plus_b(y_hat_s2, W_s, b_s)
y_preds_s = tf.argmax(y_hat_s, axis = 1)
loss_s = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_hat_s, labels=target_ph_s))
....
#second optimizer for sentence order gradients_s = tf.gradients(loss_s, params) clipped_gradientss, = tf.clip_by_global_norm(gradients_s, max_gradient_norm) optimizers = optimizer.apply_gradients( zip(clipped_gradients_s, params))
Depending on how you format your input data, you can feed it in and train using optimizer_s.
I hope this helps.
You are extremely helpful. Really really thanks.
In
DM_BCA_train.ipynb
about line 313. It load a pre-trained checkpoint.Could you please tell me why? I think this step is to train from scratch and there is no pre-trained weight. I fell a little confused. Thanks.