Closed renjie-liang closed 2 years ago
This is NOT a bug. You can simply try a simple case with torch.rand(32, 10, 512) and print the shape of result.
First, the position table only has 2 dims. Second, .repeat(bs, 1, 1) is exploited to generate positional embedding for each sample.
The code here should be written like this. I am looking forward to you can proofread it.