I was just going through the code of AttnAggregator class and I realised that the self.attn_s layer never receives the gradients. This is because, it is not a part of the forward pass.
If you check the forward method of that class, you will notice that you are always passing zeros as output of the attention aggregator due to the following:
class AttnAggregator(nn.Module):
...
def forward(self, s_hist, s, r, ent_embeds, rel_embeds):
...
# Creates a tensor of zeros
s_embed_seq_tensor = torch.zeros(len(s_len_non_zero), self.seq_len, 3 * self.h_dim).cuda()
# Passes zeros through dropout
s_embed_seq_tensor = self.dropout(s_embed_seq_tensor)
# pack the s_embed_seq_tensor
s_packed_input = torch.nn.utils.rnn.pack_padded_sequence(s_embed_seq_tensor,
s_len_non_zero,
batch_first=True)
return s_packed_input
Due to this, the Linear layer(attention layer) is never applied to the inputs. Hence the weights wont receive gradients. In other words, the weight of attention layer won't train. You can verify this by either visualising weights in tensorboard or just by printing them.
I was just going through the code of
AttnAggregator
class and I realised that theself.attn_s
layer never receives the gradients. This is because, it is not a part of the forward pass.If you check the forward method of that class, you will notice that you are always passing zeros as output of the attention aggregator due to the following:
Due to this, the Linear layer(attention layer) is never applied to the inputs. Hence the weights wont receive gradients. In other words, the weight of attention layer won't train. You can verify this by either visualising weights in tensorboard or just by printing them.