Open GriffithLin opened 10 months ago
You do not always need BOS and EOS tokens, even if you don’t have a transformer decoder. However, if you are fine-tuning ESM-2 for a specific downstream task, where you intend to use BOS and EOS tokens, then you would include them as special tokens.
Hi ! I have problem when I use ESM-2 to embedding long protein sequence. For a long protein sequence, it needs to be cropped to a sequence with a length less than 1024, and BOS and EOS tokens are used to signal the beginning and end of a real protein. My question is how to input a sequence that contains only a BOS or an EOS, or none of them? Thanks in advance.