Closed hfxunlp closed 5 years ago
Bias is indeed not used in the paper, but I don't think this makes an interesting difference. For example, I remember that the reference implementation (tensor2tensor) did use a bias at some point.
@guillaumekln got it. Thank you for your reply.
The bias in
nn.Linear
is enabled by default, and according to Sockeye, seems bias in MultiHeadAttention and also here should be disabled.But I am not sure, since I have difficulty in find related code in tensor2tensor.