SamLynnEvans / Transformer

Transformer seq2seq model, program that can build a language translator from parallel corpus
Apache License 2.0
1.35k stars 350 forks source link

A little error in Positional Encoder #25

Open MARD1NO opened 4 years ago

MARD1NO commented 4 years ago
pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))

In paper, the cos value should be

pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * i) / d_model)))
sparks-heng commented 3 years ago

yes, I have the same question. As the paper (Attention is all you need) says:

pe[pos, 2i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, 2i + 1] = math.cos(pos / (10000 ** ((2 * i) / d_model)))

where pos is the position and i is the dimension.

Now that the author use for i in range(0, d_model, 2), that means you will jump two steps each iteration.

I think the original code in Embed.py:

for pos in range(max_seq_len):
    for i in range(0, d_model, 2):
        pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
        pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))

replace by:

 for pos in range(max_seq_len):
     for i in range(0, d_model, 2):
         pe[pos, i] = math.sin(pos / (10000 ** (i / d_model)))
         pe[pos, i + 1] = math.cos(pos / (10000 ** (i / d_model)))