Open LiaoLW opened 4 years ago
In the original paper, the function of temporal block is P*σ(Q), but in this implementation in PyTorch, I can only find 3 summation, is there anything wrong or it's just my mis-understanding in this paper?
This repo does not follow the original paper somewhat.
In the original paper, the function of temporal block is P*σ(Q), but in this implementation in PyTorch, I can only find 3 summation, is there anything wrong or it's just my mis-understanding in this paper?