EdGENetworks / attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch
606 stars 133 forks source link

transpose? #9

Open hungpthanh opened 7 years ago

hungpthanh commented 7 years ago

Why do you need transpose here _s, stateword, = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

and here: torch.from_numpy(main_matrix).transpose(0,1) in def pad_batch

Thanks :)

Sandeep42 commented 7 years ago

I think transpose was used because PyTorch expects the batch_size in the second dimension, it's been a while since I have coded this. But, I have checked all the dimensions from the start to the end when I developed it. :)

hungpthanh commented 7 years ago

Thank you so much :+1:

gabrer commented 6 years ago

@Sandeep42 @hungthanhpham94 I wonder whether there is an error due to what Pytorch is expecting.

In the function train_data(), it's written:

 for i in xrange(max_sents):
        _s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

In this way, after the .transpose(0,1), the resulting mini_batch matrix has size (max_tokens, batch_size).

However, the first function to be called is the self.lookup(embed), which is expecting a (batch_size, list_of_indeces).

If this is correct, it requires to fix up all the following code.