I found when getting word embedding, the embedding matrix's size is changed to (batch_size, max_length+1, embedding_dim). The position of [CLS] is calculated to the embedding matrix. Can I change stack of token embedding to cap_embedding = torch.stack(tokens_embedding[1:])?
I found when getting word embedding, the embedding matrix's size is changed to (batch_size, max_length+1, embedding_dim). The position of [CLS] is calculated to the embedding matrix. Can I change stack of token embedding to cap_embedding = torch.stack(tokens_embedding[1:])?