Closed hust220 closed 1 year ago
NOTE: if this is not a bug report, please use the GitHub Discussions for support questions (How do I do X?), feature requests, ideas, showcasing new applications, etc.
Bug description ESM is failed for protein Ccnd1 (Q790L7)
mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi
Reproduction steps
sequences = [('', 'mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi')] batch_converter = self.alphabet.get_batch_converter() batch_labels, batch_strs, batch_tokens = batch_converter(sequences) with torch.no_grad(): results = self.model(batch_tokens, repr_layers=[33]) embeddings = results["representations"][33]
Expected behavior Give a clear and concise description of what you expected to happen.
Logs
File ~/scratch/programs/miniconda/3/envs/torch/lib/python3.9/site-packages/esm/data.py:250, in (.0) 249 def encode(self, text): --> 250 return [self.tok_to_idx[tok] for tok in self.tokenize(text)] KeyError: 'mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi'
File ~/scratch/programs/miniconda/3/envs/torch/lib/python3.9/site-packages/esm/data.py:250, in (.0) 249 def encode(self, text): --> 250 return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
KeyError: 'mehqllccevetirraypdtnllndrvlramlkteetcapsvsyfkcvqkeivpsmrkivatwmlevceeqkceeevfplamnyldrflsleplkksrlqllgatcmfvaskmketipltaeklciytdnsirpeellqmelllvnklkwnlaamtphdfiehflskmpeadenkqtirkhaqtfvalcatdvkfisnppsmvaagsvvaamqglnlgspnnflscyrtthflsrvikcdpdclracqeqieallesslrqaqqnvdpkateeegeveeeaglactptdvrdvdi'
Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)
Sorry, I found the sequence should be in upper case.
NOTE: if this is not a bug report, please use the GitHub Discussions for support questions (How do I do X?), feature requests, ideas, showcasing new applications, etc.
Bug description ESM is failed for protein Ccnd1 (Q790L7)
Reproduction steps
Expected behavior Give a clear and concise description of what you expected to happen.
Logs
Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)