Open MARD1NO opened 4 years ago
yes, I have the same question. As the paper (Attention is all you need) says:
pe[pos, 2i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, 2i + 1] = math.cos(pos / (10000 ** ((2 * i) / d_model)))
where pos is the position and i is the dimension.
Now that the author use for i in range(0, d_model, 2)
, that means you will jump two steps each iteration.
I think the original code in Embed.py
:
for pos in range(max_seq_len):
for i in range(0, d_model, 2):
pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))
replace by:
for pos in range(max_seq_len):
for i in range(0, d_model, 2):
pe[pos, i] = math.sin(pos / (10000 ** (i / d_model)))
pe[pos, i + 1] = math.cos(pos / (10000 ** (i / d_model)))
In paper, the cos value should be