Closed hwlsrr closed 6 months ago
Hi, thank you for your comment. It appears there was an oversight on my part, and the correct input shape should be [batch size, sequence length, embed dim]. Alternatively, you can interchange the embedding dimension and sequence length if you wish to compute attention among different channels rather than different timestamps.
Dear Author:
While reading your tAPE code, I have a doubt: you mentioned that the input shape is [sequence length, batch size, embed dim], but when I input the shape as [batch size,sequence length, embed dim], the program still runs normally.
So, I would like to ask you, if my data input shape is [batch size,sequence length, embed dim], for tAPE model, what all do I need to change in order to ensure high performance of tAPE?