Farahn / AES

Automatic Essay Scoring
33 stars 12 forks source link

Restore a checkpoint when training from scratch #6

Open GabrielLin opened 4 years ago

GabrielLin commented 4 years ago

In DM_BCA_train.ipynb about line 313. It load a pre-trained checkpoint.

MODEL_PATH = 'bca_dm_model/model300-20500-1800-1800'
saver = tf.train.Saver()

sess = tf.Session()
saver.restore(sess, MODEL_PATH)

Could you please tell me why? I think this step is to train from scratch and there is no pre-trained weight. I fell a little confused. Thanks.

Farahn commented 4 years ago

Thank you for pointing that out. This is useful for running the model for more tuning, but for training from scratch replace this with initialize all variables:

sess = tf.Session(config = config) sess.run(tf.global_variables_initializer())

Farahn commented 4 years ago

A quick clarification though, if you do initialize the variables, you will loose the pre-training from the discourse marker (DM) prediction task, and the model will only be trained on the essay data. The pre-trained model that is loaded in this script is only trained on the DM prediction task, not on any essay scoring task.

GabrielLin commented 4 years ago

Thanks. Your reply is very detailed and clear. How can I train this discourse marker (DM) model myself?

Farahn commented 4 years ago

For the DM prediction task, we used the book corpus from https://www.smashwords.com/ Due to the nature of the copyright, we cannot share the data, however you can download it and create your own data set. The details for the discourse markers we used and how they were categorized are in our paper.

GabrielLin commented 4 years ago

Could you please share your training code of DM prediction task? I would like to create my own data set and try.

Farahn commented 4 years ago

The layers and loss functions are in the code:

... num_classes_s = 8 ....

first classifier for first task using the representation from attention_outputs

adding more layers...

attention_output_sentorder = tf.reshape(attention_output, [-1,HIDDEN_SIZE223]) Ws1 = tf.Variable(tf.truncated_normal([HIDDEN_SIZE223, LAYER_1], stddev=0.1))
bs1 = tf.Variable(tf.truncated_normal([LAYER_1])) y_hats1 = tf.nn.xw_plus_b(attention_output_sentorder, Ws1, bs1) W_s2 = tf.Variable(tf.truncated_normal([LAYER_1, LAYER_2], stddev=0.1))
b_s2 = tf.Variable(tf.truncated_normal([LAYER_2])) y_hat_s2 = tf.nn.xw_plus_b(y_hats1, W_s2, b_s2)

W_s = tf.Variable(tf.truncated_normal([LAYER_2, num_classes_s], stddev=0.1))
b_s = tf.Variable(tf.truncated_normal([num_classes_s])) y_hat_s = tf.nn.xw_plus_b(y_hat_s2, W_s, b_s) y_preds_s = tf.argmax(y_hat_s, axis = 1) loss_s = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_hat_s, labels=target_ph_s))

....

#second optimizer for sentence order gradients_s = tf.gradients(loss_s, params) clipped_gradientss, = tf.clip_by_global_norm(gradients_s, max_gradient_norm) optimizers = optimizer.apply_gradients( zip(clipped_gradients_s, params))

Depending on how you format your input data, you can feed it in and train using optimizer_s.

I hope this helps.

GabrielLin commented 4 years ago

You are extremely helpful. Really really thanks.