JRC1995 / Abstractive-Summarization

Implementation of abstractive summarization using LSTM in the encoder-decoder architecture with local attention.
MIT License
167 stars 59 forks source link

should saver.save used before incrementing the next epoch value? #9

Closed hack1234567 closed 6 years ago

hack1234567 commented 6 years ago

should saver.save used before incrementing the next epoch value? ie should it be saved just before step=step+1 or just after that?

JRC1995 commented 6 years ago

It won't matter if you save just before step=step+1 or after.

hack1234567 commented 6 years ago

_, loss, pred= sess.run([optimizer, cost, prediction], feed_dict={tf_text: train_texts[i], tf_seq_len: len(train_texts[i]), tf_summary: train_out, tf_output_len: len(train_out)})

for testing, i just have to feed test data to tf_text and seqlen , rest everything is same? when i try to do it after removing loss and optimiser, and restoring the model.cpkt, its not working as it is again training from iteration 0

this is where it was saved ---

step = step + 1 saver.save(sess, 'Model_Backup/model.ckpt')

saver = tf.train.Saver()
saver.restore(sess, 'Model_Backup/model.ckpt')
sess.run(tf.global_variables())
total_loss = 0
total_acc = 0
total_val_loss = 0
total_val_acc = 0

for i in range(0, len(test_texts)):

//// rest of the code

and then test_texts is fed and trainout already calculated test_sumaries length

    pred = sess.run([prediction], feed_dict={tf_text: test_texts[i], tf_seq_len: len(test_texts[i] ......etc......
JRC1995 commented 6 years ago

for testing, i just have to feed test data to tf_text and seqlen , rest everything is same?

Should be.

when i try to do it after removing loss and optimiser, and restoring the model.cpkt, its not working as it is again training from iteration 0

Show me the whole code of the training-testing section. And also tell me exactly what you are doing i.e are you saving, then quitting, then starting again or what?

I have a couple of points that I can suggest to check (without knowing what's going on):

Check if you are not accidentally overwriting the saved model with another new model. You have the save code before load code.

So if you run the program from top to bottom, you will save first then load.

If you save then quit, and then run again - before loading the saved model, the model will ofcourse start training from the beggining. A new model will start to form - which will be then saved. This new barely trained model will then overwrite your older saved model. And then load will simply load the new model which was barely trained for 1 epoch or whatever.

Also, what do you mean by training is beginning from iteration 0. The value of epoch\step is not being stored. So it will always appear to begin with iteration 0 even if you properly loaded a saved model.

If you want to verify, try to match the loss or something. Calculate the average loss of all data for the saved model. Calculate, again, the average loss for all data for the loaded model. Match them to see if your model is loaded.

hack1234567 commented 6 years ago

Also, what do you mean by training is beginning from iteration 0. The value of epoch\step is not being stored. So it will always appear to begin with iteration 0 even if you properly loaded a saved model.

what i meant was, if there arises an occasion to showcase the capability of the model to someone else... is it possible to test it directly by giving unbiased data? rather than train it each time, which can take days. More like a separate file only for testing, so if saving was possible even after quitting , like equivalent of pickling the saved parameters to some file , so that it would be easy to showcase the model.

JRC1995 commented 6 years ago

what i meant was, if there arises an occasion to showcase the capability of the model to someone else...is it possible to test it? rather than train it each time which can take days.

Yes, possible. What is necessary is storing the model parameters, which will be saved with the tensorflow saver function. You can load the model paramters and resume training, or use it for prediction to show the model capacity.

Here I loaded a pre-trained model and used it for prediction without retrainining again: https://github.com/JRC1995/Wide-Residual-Network/blob/master/OLD/Predict-lite.ipynb

The value of iteration won't be saved using the tensorflow saver function, but you can save that value to some external file too, using some python code. If the iteration value is not saved, but the model is saved and loaded, training will only superficially 'appear' to begin from iteration 0 without actually losing the progress of previous training.

If you aren't able to load the saved model properly it's possible there's some logical error in your code, may be something to do with the order of execution.

hack1234567 commented 6 years ago

so, testing as a separate file , everything in the training segment have to copied, and the only change here would be feeding test data.....other than that, the rra,lstm , loss calculation, output everything would be the same rit?

JRC1995 commented 6 years ago

You should be able to test on the same file.

JRC1995 commented 6 years ago

Try creating a python bool variable "load". Keep it at false if you don't want to load any previously saved model or if there isn't any saved model, otherwise set load as True manually.

Then create a conditional statement (basic python if else statment).

if load is true, then load the model.

if load is false, don't load but initialize the variables

A rough pseudo-code:

load = True //(or False dependning on what you want) saver = tf.train.Saver()

if load == True: saver.restore(sess, 'Model_Backup/model.ckpt') sess.run(tf.global_variables()) else: init = tf.global_variables_initializer() sess.run(init)

Start training epochs after this code And save the model in between the training loops wherever you want.

Validate if you wish, periodically.

And in the end use test data.

The save-load is useful if the program is terminated and you wish to resume from the previous save-point. If the model is saved, the model will simply be loaded before the training start next time when you run the program if load=True.

hack1234567 commented 6 years ago

if load == True: saver.restore(sess, 'Model_Backup/model.ckpt') sess.run(tf.global_variables())

dont't you think test data code should be written here since we already have the model? also when i do that im getting the following error

train_out = transform_out(test_summaries[i][0:len(test_summaries[i]) - 1]) IndexError: list index out of range

here i is in range 0 to 100

JRC1995 commented 6 years ago

dont't you think test data code should be written here since we already have the model?

You can write the testing code there if you don't want to train any more. Normally testing is done after all training is complete.

train_out = transform_out(test_summaries[i][0:len(test_summaries[i]) - 1]) IndexError: list index out of range

I don't know why that's happening. Is the error from that specific line? Not somewhere within the transform_out function? Then I am guessing the error is associated with "test_summaries[i][0:len(test_summaries[i]) - 1]" Was this error happening from the beginning, or after you made some change? i is in range 0 to 100, then are you sure there are exactly 100 data samples in test_summaries?

This error happens when the program tries to access a data from an list index which don't exist.

hack1234567 commented 6 years ago

Update: able to run it when j in range(trainlen,testlen) but immediately program finishes executing ,return with 0. It is not printing any of the texts or summaries...

JRC1995 commented 6 years ago

can you post the whole training-testing section code ?

if load == True: saver.restore(sess, 'Model_Backup/model.ckpt') sess.run(tf.global_variables()) print("hurrah the model is ofund") step = 0 loss_list = [] acc_list = [],,,,,,,,,,,,,,,,,,,,,,,,,all the variables

you can put the step=0, loss_list=[] and etc. after the if-else statement (outside it). You can make the if-else statement simply determine which model to run. If load = false, a fresh model will be initialized (more specificallly, training parameters would be randomly initalized based on the specifications). If load = True, saved training model will be loaded. Everything else may be better kept outside this if-else statement.

JRC1995 commented 6 years ago

Update: able to run it when j in range(trainlen,testlen) but immediately program finishes executing ,return with 0. It is not printing any of the texts or summaries...

But why would you do that? It is probably not printing anything because trainlen is greater than testlen, - I am not sure what will happen in that case. Also what is j's initial value? I think the condition j is in range(trainlen,testlen) is returning False.

but here test_summaries = vec_summaries_reduced[train_len+val_len:len(vec_summaries_reduced)] is it because j is not properly indexing the testsummaries list since it is starting from trainlen+vallen?

check print len(test_summaries[j]). also check print len(test_summaries)

or something.

JRC1995 commented 6 years ago

if load == True: ////testing saver.restore(sess, 'Model_Backup/model.ckpt') sess.run(tf.global_variables()) print("hurrah the model is ofund")

Training can take days or at least hours. You may need to shut down the program in the middle. Or the program may terminate due to some accident.

That's why you should have a mechanism to 'resume' training from last save point. Don't just use the loading function for testing.

So it's better if you do it like this:

saver = tf.train.Saver()

if load == True: saver.restore(sess, 'Model_Backup/model.ckpt') sess.run(tf.global_variables()) print("hurrah the model is found")

--- You may put the testing code here if you want to test the loaded model ---

else: print("we will be training from the beginning, boys") init = tf.global_variables_initializer() sess.run(init)

Do the training after this and OUTSIDE The if-else scope.

for j in range(train_len, testlen):

Should be range(0,testlen)

Your code is immediately exiting because it isn't even getting inside the for loop. That's why that is no error. It's just skipping the loop.

train_out = transform_out(test_summaries[i][0:len(test_summaries[i]) - 1])

if that's giving problems to you, print len(test_summaries) before this print testlen, see if that's equal. Check to see if testlen is acheived through proper means.

Also if your test_summaries or train_summaries do not end with you don't need to use the -1, just use train_out = transform_out(test_summaries[i][0:len(test_summaries[i])]) I don't exactly remember why I used -1, but I think it was to ignore the .

_, loss, pred = sess.run([optimizer, cost, prediction], feed_dict={tf_text: test_texts[j], tf_seq_len: len(test_texts[j]), tf_summary: train_out, tf_output_len: len(train_out)})

Don't sess.run the optimizer - this will start backpropagation. Remember, you are testing here, not training the model.

use:

loss, pred = sess.run([cost, prediction], feed_dict={tf_text: test_texts[j], tf_seq_len: len(test_texts[j]), tf_summary: train_out, tf_output_len: len(train_out)})

or something like that.

That's my suggestions for now. I can't tell what exactly is causing the index out of range. The issue might be present much before where test_summaries are being generated. That's why you should print and diagnose.

Or check earlier codes. Make sure that there's a value at test_summaries[testlen-1]

That's it for now.

Also, note: what you are trying to do is ok, if you just want to test the save-load function. But to properly complete the program you need to use validation data too. Saving should be co-ordinated with validation. We usually choose a model not based on training loss or accuracy, but based on validation loss\accuracy. So to make sure the best model is saved in the end, you can save only that model which has lower validation loss or something. Test data is used when all is said and done. In the very end on the best chosen model. But we can think about it later. Try to troubleshoot the issue with test_summary list index out of range.

Tell me the print results.

hack1234567 commented 6 years ago

encoded_hidden(ie concatenated array)

here i try to print it o=after returning thorugh model whats printed is the tensor parameter

Tensor("concat_6:0", shape=(?, 1000), dtype=float32)

how to print the value?

JRC1995 commented 6 years ago

Did you print it within sess.run?

hack1234567 commented 6 years ago

tf.Print works

JRC1995 commented 6 years ago

ok,

I remember, I used to print using something like this after returning from the function:

        # Run optimization operation (backpropagation)
        _,loss,pred,enc_hidden = sess.run([optimizer,cost,prediction,encoded_hidden],feed_dict={tf_text: train_texts[i], 
                                                tf_seq_len: len(train_texts[i]), 
                                                tf_summary: train_out,
                                                tf_output_len: len(train_out)})

        print(enc_hidden)

tf.Print sounds like a new api. Good if it works.

hack1234567 commented 6 years ago

print(enc_hidden) , for me this is not working. when i try to print output(the array) inside tht sess.run part, i am getting

.....(some list values0 0.04142889, 0.01834947]], dtype=float32) has invalid type <class 'numpy.ndarray'>, must be a string or Tensor. (Can not convert a ndarray into a Tensor or Operation.)

But tf.Print(x,[x]) worked for output since its a tensorarray .... i am getting list of lists.

but even this is not working for anything else . Itried using enc_hidden.stack , without stacking , but no result. what exactly is enc_hidden since it would be having (?,1000) as shape.....and its not being printed in tf.print or in sess.run. even decoded_hidden or hiddenresidual, nothing is printed....

and

def cond_pred(i, pred): return i < tf_output_len def body_pred(i, pred): pred = pred.write(i, tf.cast(tf.argmax(output[i]), tf.int32)) return i + 1, pred

is this where the output is limited to be equal to input seq length?

JRC1995 commented 6 years ago

But tf.Print(x,[x]) worked for output since its a tensorarray .... i am getting list of lists. encoded_hidden is not a tensorarray. Tensorarray is like a dynamic array. You are getting a tensor.

but even this is not working for anything else . Itried using enc_hidden.stack , without stacking , but no result. what exactly is enc_hidden since it would be having (?,1000) as shape.....and its not being printed in tf.print or in sess.run. even decoded_hidden or hiddenresidual, nothing is printed....

?,1000 is the shape because the ? part depends on the length of the sequence...the length of the sequence is fed through placeholder during runtime.

Check my code:

_,loss,pred,enc_hidden = sess.run([optimizer,cost,prediction,encoded_hidden],feed_dict= {tf_text: train_texts[i], tf_seq_len: len(train_texts[i]), tf_summary: train_out, tf_output_len: len(train_out)})

print(enc_hidden)

First, you have to return the encoded_hidden from the model function like how I mentioned before. Then make the changes as shown in the bolded part. Then print(enc_hidden). This code worked for me. I posted it after checking it.

hack1234567 commented 6 years ago

Does calculating loss here make sense since it won't check whether its semantically and meaning wise similar or not? also when we r converting to logits compatible format say word is hello world , then it can return index 20, 2 if hello is found at 20th position and world at 2? but how does that help in loss calculation? since output wont be in that form, so how is comparison with output possible?

tf_summary: train_out, tf_output_len: len(train_out)

shouldnt tfsummary given vectorised form of output, but here it owuld give [20,2,....] ?

also in allign function why is sigma = tf.constant(D/2,dtype=tf.float32). why is it d/2

also what is the use of doing the following

comp_1 = tf.cast(tf.square(pos-pt),tf.float32) comp_2 = tf.cast(2*tf.square(sigma),tf.float32) pd = pd.write(i,tf.exp(-(comp_1/comp_2)))

JRC1995 commented 6 years ago

Does calculating loss here make sense since it won't check whether its semantically and meaning wise similar or not?

Where?

Are you talking about my code for printing encoded_hidden? It will check everything, because the data is being fed. Encoded_hidden is formed only when data is fed, that's when it's value is calculated. That's why we can't independently print encoded_hidden.

Anyway, if you ONLY want to print hidden, you can do this:

enc_hidden = sess.run([encoded_hidden],feed_dict= {tf_text: train_texts[i], tf_seq_len: len(train_texts[i]), tf_summary: train_out, tf_output_len: len(train_out)})

print(enc_hidden)

..........

also when we r converting to logits compatible format say word is hello world , then it can return index 20, 2 if hello is found at 20th position and world at 2?

yes

but how does that help in loss calculation? since output wont be in that form, so how is comparison with output possible?

I don't know how the tf function exactly works. But, you have to understand that 20,2 has all the necessary information for comparison.

The tf sparse function may be a more efficient version of the tf loss function that takes one hot encoded vectors as logits.

The tf spare function may be treating 20,2 as one hot encoded vectors too....

Let's say the vocabulary length is three.

The model output is:

0.1 0.8 0.1 (probability distribution of 1st word) 0.1 0.1 0.8 (probability distribution of 2nd word) 0.1 0.8 0.1 (probability distribution of 3rd word)

0.1 of 1st word probablility distribution indicates the probability of the word in vocabulary index 0 0.8 of 1st word probability distribution indicates the probability of the word in vocabulary index 1 and so on.

The target output is: tigers and lions.

Let's say the lion is in position 0, tigers are in position 1 and, and is in position 2 of the vocabulary.

So tiger and lions will become 0,1,2 (like Hello World was 20,2 in your example).

Now, 0 (from 0,1,2) can be treated as the one hot encoded vector as 1 0 0 (0 is the 1st index, so there is 1 in the first index indicating that's the answer, the other indices are 0. The length of vocabulary can be retrieved by checking the length of the probability distribution)

So 0.1 0.8 0.1 can be compared with 0, by treating the 0 as 1 0 0 (the ideal answer - maximum probability of 1 for the word at index 0 i.e tigers and minimum probability - 0 for everything else)

Same for other words

also in allign function why is sigma = tf.constant(D/2,dtype=tf.float32). why is it d/2

I don't really know why. I used it because that was the value used in the research paper.

Ok, I will try to check the paper again to provide a better answer:

Here's an excerpt from the paper:

We use the same align function as in Eq. (7) and the standard deviation is empirically set as σ= D/2

The paper mentions that sigma is D/2 is empitically set. I am not totally sure what they meant by 'empirically' in this context.

Normally something is know empirically if the knowledge comes from external world, or sense-data as opposed to pure logic and reason.

So I am assuming, the value of sigma is chosen based on 'observation' (of sense data) i.e it might be determined by trial and error, and experimenting which value gives a better result.

That sounds like as if sigma is here treated as a hyperparameter. That's my rough assumption about the situation. So take it with a grain of salt.

comp_1 = tf.cast(tf.square(pos-pt),tf.float32) comp_2 = tf.cast(2*tf.square(sigma),tf.float32) pd = pd.write(i,tf.exp(-(comp_1/comp_2)))

This is the implementation of the formulas presented in the paper (of local attention). This specific formula represents gaussian distribution\Normal distribution.

Excerpt from the paper:

To favor alignment points near pt, we place a Gaussian distribution centered around pt.

in the code, comp_1 is basically (position - pt) ^ 2

the position is given as s in the paper. So it's (s-pt)^2

comp_2 is 2*(sigma)^2

pd is exp( - (s-pt)^2 / 2*(sigma)^2 )

normalizedscores in the code is basically the result of what is given as align(ht,hs) in the paper.

so in code,
G = tf.multiply(normalized_scores,pd)

This is equivalent to the formula in paper:

at(s) = align(ht,hs_) exp( - (s-pt)^2/2(sigma)^2)

link of the paper if you need: https://nlp.stanford.edu/pubs/emnlp15_attn.pdf