Closed liudan111 closed 2 years ago
Thanks for your question! During the training process we cropped sequences >1024 sequences, so the model (specifically the learned positional embeddings) can not handle longer sequences. See #21 and #76 for prior discussion on this topic.
Thanks for your question! During the training process we cropped sequences >1024 sequences, so the model (specifically the learned positional embeddings) can not handle longer sequences. See #21 and #76 for prior discussion on this topic.
Thank you for your reply!
code: python extract.py esm1b_t33_650M_UR50S AB024414.fasta esm1b/AB024414 --repr_layers 0 32 33 --include mean
Bug description Transferred model to GPU Read /home1/……/AB024414.fasta with 65 sequences Processing 1 of 11 batches (16 sequences) Processing 2 of 11 batches (11 sequences) Processing 3 of 11 batches (9 sequences) Processing 4 of 11 batches (7 sequences) Processing 5 of 11 batches (5 sequences) Processing 6 of 11 batches (5 sequences) Processing 7 of 11 batches (4 sequences) Processing 8 of 11 batches (3 sequences) Traceback (most recent call last): File "extract.py", line 136, in
main(args)
File "extract.py", line 95, in main
out = model(toks, repr_layers=repr_layers, return_contacts=return_contacts)
File "/home1/……/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, *kwargs)
File "/home1/……/tool/esm/esm/model.py", line 136, in forward
x = x + self.embed_positions(tokens)
File "/home1/……/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, **kwargs)
File "/home1/……/tool/esm/esm/modules.py", line 242, in forward
f"Sequence length {input.size(1)} above maximum "
ValueError: Sequence length 1042 above maximum sequence length of 1024.
Do I need to split my protein sequence into lengths of 1024? Why there is an issue like this? I would be appreciated that if you could help me.