EdGENetworks / attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch
606 stars 133 forks source link

Init hidden state for the 2nd sentence onward #15

Open smutahoang opened 5 years ago

smutahoang commented 5 years ago

Hi,

Thanks for sharing your implementation. This helps me a lot.

I just wonder the way you initialize the hidden state for the question second question onward. Precisely, in the "def train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion):" function (in the "attention_model_validation_experiments" notebook), you currently use a loop over the sentence: "_s, stateword, = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)". That means, both "the forward and backward states of the last word in the sentence i" are used for initializing the forward and backward states of sentence i+1. I can understand the case for forward state as the two sentence are consecutive, but the backward state initialization seems not very reasonable.

Can you please explain this in more detail? Thanks.

Sandeep42 commented 5 years ago

Hi,

I'm sorry that I don't understand what you are asking for. You will have to initialise both the forward and the backward states initially to start the training process.

Please revert back to me with a bit more clarity so that I will be able to help you out.

Thanks.

smutahoang commented 5 years ago

Lets use last_h_S = (last_h_forward, last_h_backward) to denote the hidden states of the last word in sentence number S, and use inith[S+1] to denote the init hidden states of the sentence number S + 1.

From the code, I understand that you assign inith[S+1] = last_h_S = (last_h_forward, last_h_backward) (am I right?). Should it be more reasonable to set inith[S+1] = (last_h_forward, 0) ?