"Inner Attention NAACL Encoder" implementation

I was having a look at the implementation of the "InnerAttentionNAACLEncoder" which should be the sentence encoder from the "Hierarchical Attention Networks for Document Classification" by Tang et al. 2016.

However, I would raise the following issues:

On line 538, it sums up the product of the Alphas weights alphas with the linear projection of the hidden state of each word. However, in the original paper, they just multiply the weights with the original hidden states representing the words, which should be sent_output rather than sent_output_proj.
On line 529, it computes the dot product of the projected hidden state sent_key_proj with the so-called Context vector sent_w (i.e. u_it and u_w, respectively in the paper). However, it looks that sent_w it is -randomly- instantiated at each iteration with Variable(torch.LongTensor(bsize*[0]).cuda()), input to an embedding layer. I am wondering whether this vector should be instead a model parameter learned during the training, as stated in the paper. But this part of the code is not very clear to me.
It uses BiLSTM, while the paper states they have used BiGRU.
It extracts self.pool_type without using it. Might be a typo?

facebookresearch / InferSent