facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

ESM is failed for the protein Ccnd1 (Q790L7) #520

Closed hust220 closed 1 year ago

hust220 commented 1 year ago

NOTE: if this is not a bug report, please use the GitHub Discussions for support questions (How do I do X?), feature requests, ideas, showcasing new applications, etc.

Bug description ESM is failed for protein Ccnd1 (Q790L7)

mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi

Reproduction steps

sequences = [('', 'mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi')]
batch_converter = self.alphabet.get_batch_converter()
batch_labels, batch_strs, batch_tokens = batch_converter(sequences)
with torch.no_grad():
    results = self.model(batch_tokens, repr_layers=[33])
embeddings = results["representations"][33]

Expected behavior Give a clear and concise description of what you expected to happen.

Logs

File ~/scratch/programs/miniconda/3/envs/torch/lib/python3.9/site-packages/esm/data.py:250, in (.0) 249 def encode(self, text): --> 250 return [self.tok_to_idx[tok] for tok in self.tokenize(text)]

KeyError: 'mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi'

Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)

hust220 commented 1 year ago

Sorry, I found the sequence should be in upper case.