I read your blog post in TowardsDataScience on this model, and I think there may be a computational error in line 27 of Transformer/Embed.py. In the paper and in other implementations, like this one, we should have PE_(pos, 2i+1) = math.cos(pos / (10000 * ((2 i)/d_model))), not math.cos(pos / (10000 * ((2 (i + 1))/d_model))), as the code currently stands.
I read your blog post in TowardsDataScience on this model, and I think there may be a computational error in line 27 of Transformer/Embed.py. In the paper and in other implementations, like this one, we should have PE_(pos, 2i+1) = math.cos(pos / (10000 * ((2 i)/d_model))), not math.cos(pos / (10000 * ((2 (i + 1))/d_model))), as the code currently stands.