Closed zhupengjia closed 5 years ago
@zhupengjia Can you show me the output of fixed version of your code?
Thank you for your answer. The output of both version: In [1]: import torch, math
In [2]: d_model = 128
In [3]: max_len = 512
In [4]: (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp() Out[4]: tensor([1.0000, 0.8660, 0.7499, 0.6494, 0.5623, 0.4870, 0.4217, 0.3652, 0.3162, 0.2738, 0.2371, 0.2054, 0.1778, 0.1540, 0.1334, 0.1155, 0.1000, 0.0866, 0.0750, 0.0649, 0.0562, 0.0487, 0.0422, 0.0365, 0.0316, 0.0274, 0.0237, 0.0205, 0.0178, 0.0154, 0.0133, 0.0115, 0.0100, 0.0087, 0.0075, 0.0065, 0.0056, 0.0049, 0.0042, 0.0037, 0.0032, 0.0027, 0.0024, 0.0021, 0.0018, 0.0015, 0.0013, 0.0012, 0.0010, 0.0009, 0.0007, 0.0006, 0.0006, 0.0005, 0.0004, 0.0004, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001])
In [5]: (torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)).float().exp() Out[5]: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In [6]: torch.version Out[6]: '1.0.0a0+ff608a9'
Please check in your side.
@zhupengjia This is cool, can you make a pull request for add your contribution! Please let me know your thought :)
@codertimo Sure~ Just make a pull request
div_term = (torch.arange(0, d_model, 2) -(math.log(10000.0) / d_model)).float().exp() should be: div_term = (torch.arange(0, d_model, 2).float() -(math.log(10000.0) / d_model)).exp()
In [51]: (torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)).float().exp() ...: Out[51]: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Additional question: I don't quite understand how "bidirectional" transformer in the raw paper implemented. Maybe like BiLSTM: concat two direction's transformer output together? Didn't find the similar structure in your code.