zhupengjia commented 5 years ago

div_term = (torch.arange(0, d_model, 2) -(math.log(10000.0) / d_model)).float().exp() should be: div_term = (torch.arange(0, d_model, 2).float() -(math.log(10000.0) / d_model)).exp()

In [51]: (torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)).float().exp() ...: Out[51]: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Additional question: I don't quite understand how "bidirectional" transformer in the raw paper implemented. Maybe like BiLSTM: concat two direction's transformer output together? Didn't find the similar structure in your code.

codertimo commented 5 years ago

6 Please check this issue, which I answered the same question

codertimo commented 5 years ago

@zhupengjia Can you show me the output of fixed version of your code?

zhupengjia commented 5 years ago

Thank you for your answer. The output of both version: In [1]: import torch, math

In [2]: d_model = 128

In [3]: max_len = 512

In [4]: (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp() Out[4]: tensor([1.0000, 0.8660, 0.7499, 0.6494, 0.5623, 0.4870, 0.4217, 0.3652, 0.3162, 0.2738, 0.2371, 0.2054, 0.1778, 0.1540, 0.1334, 0.1155, 0.1000, 0.0866, 0.0750, 0.0649, 0.0562, 0.0487, 0.0422, 0.0365, 0.0316, 0.0274, 0.0237, 0.0205, 0.0178, 0.0154, 0.0133, 0.0115, 0.0100, 0.0087, 0.0075, 0.0065, 0.0056, 0.0049, 0.0042, 0.0037, 0.0032, 0.0027, 0.0024, 0.0021, 0.0018, 0.0015, 0.0013, 0.0012, 0.0010, 0.0009, 0.0007, 0.0006, 0.0006, 0.0005, 0.0004, 0.0004, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001])

In [5]: (torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)).float().exp() Out[5]: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [6]: torch.version Out[6]: '1.0.0a0+ff608a9'

Please check in your side.

codertimo commented 5 years ago

@zhupengjia This is cool, can you make a pull request for add your contribution! Please let me know your thought :)

zhupengjia commented 5 years ago

@codertimo Sure~ Just make a pull request

codertimo / BERT-pytorch

model/embedding/position.py #18

6 Please check this issue, which I answered the same question