INK-USC / RE-Net

Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs (EMNLP 2020)
http://inklab.usc.edu/renet/
436 stars 95 forks source link

The weights in attention layer never change throughout training!!! #6

Closed sumitpai closed 5 years ago

sumitpai commented 5 years ago

I was just going through the code of AttnAggregator class and I realised that the self.attn_s layer never receives the gradients. This is because, it is not a part of the forward pass.

If you check the forward method of that class, you will notice that you are always passing zeros as output of the attention aggregator due to the following:

class AttnAggregator(nn.Module):
   ...
   def forward(self, s_hist, s, r, ent_embeds, rel_embeds):
      ...
      # Creates a tensor of zeros
      s_embed_seq_tensor = torch.zeros(len(s_len_non_zero), self.seq_len, 3 * self.h_dim).cuda()

      # Passes zeros through dropout        
      s_embed_seq_tensor = self.dropout(s_embed_seq_tensor)

      # pack the s_embed_seq_tensor
      s_packed_input = torch.nn.utils.rnn.pack_padded_sequence(s_embed_seq_tensor,
                                                                 s_len_non_zero,
                                                                 batch_first=True)
      return s_packed_input

Due to this, the Linear layer(attention layer) is never applied to the inputs. Hence the weights wont receive gradients. In other words, the weight of attention layer won't train. You can verify this by either visualising weights in tensorboard or just by printing them.

woojeongjin commented 5 years ago

We fixed the code. Thanks for your correction!