Open fgaoyang opened 5 years ago
@Yang92to Great Point, I'll check out the BERT positional embedding method, and update ASAP
@codertimo the BERT positional embedding method is to just learn an embedding for each position. So you can use nn.Embedding with a constant input sequence [0,1,2,...,L-1] where L is the maximum sequence length.
@codertimo Since BERT uses learned positional embeddings and it is one of the biggest difference between original transformers and BERT, I think it is quite urgent to modify the positional embedding part.
The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?