Closed smiles724 closed 1 year ago
In the frontpage README we mention:
mean includes the embeddings averaged over the full sequence, per layer. bos includes the embeddings from the beginning-of-sequence token. (NOTE: Don't use with the pre-trained models - we trained without bos-token supervision)
In general we don't expend the bos (equivalent to CLS) token to have meaningful representations as it hasn't been supervised.
Thanks for your reply.
Hi, thanks for providing such useful tools! However, I wonder what is the best way to gather the representation of the protein.
To be specific, the official document gives an example to generate per-sequence representations via averaging token-level representations. But as you know, transformer-based models in NLP prefer using the first token [CLS] as the per-sequence representation.
Do you try these two different methods? Or it may depend empirically on the task that I choose?