Open smutahoang opened 5 years ago
Hi,
I'm sorry that I don't understand what you are asking for. You will have to initialise both the forward and the backward states initially to start the training process.
Please revert back to me with a bit more clarity so that I will be able to help you out.
Thanks.
Lets use last_h_S = (last_h_forward, last_h_backward) to denote the hidden states of the last word in sentence number S, and use inith[S+1] to denote the init hidden states of the sentence number S + 1.
From the code, I understand that you assign inith[S+1] = last_h_S = (last_h_forward, last_h_backward) (am I right?). Should it be more reasonable to set inith[S+1] = (last_h_forward, 0) ?
Hi,
Thanks for sharing your implementation. This helps me a lot.
I just wonder the way you initialize the hidden state for the question second question onward. Precisely, in the "def train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion):" function (in the "attention_model_validation_experiments" notebook), you currently use a loop over the sentence: "_s, stateword, = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)". That means, both "the forward and backward states of the last word in the sentence i" are used for initializing the forward and backward states of sentence i+1. I can understand the case for forward state as the two sentence are consecutive, but the backward state initialization seems not very reasonable.
Can you please explain this in more detail? Thanks.