While computing embeddings in bulk from fasta file, ( esm-extract esm2_t33_650M_UR50D all_spike_protein.fasta outputs/ --repr_layers 0 32 33 --include mean per_tok --nogpu) I am facing error:
Traceback (most recent call last):
File "/raid/home/smrutip/anaconda3/envs/genslm/bin/esm-extract", line 8, in
sys.exit(main())
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/scripts/extract.py", line 137, in main
run(args)
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/scripts/extract.py", line 88, in run
for batch_idx, (labels, strs, toks) in enumerate(data_loader):
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 266, in call
seq_encoded_list = [self.alphabet.encode(seq_str) for seq_str in seq_str_list]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 266, in
seq_encoded_list = [self.alphabet.encode(seq_str) for seq_str in seq_str_list]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 250, in encode
return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 250, in
return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
KeyError: 'J'
Can you please tell me what is the error about and how to rectify it? @tomsercu , @joshim5, @rmrao , @naailkhan28 , @liujas000 , @nikitos9000 , @ebetica , @chloechsu , @YaoYinYing
While computing embeddings in bulk from fasta file, ( esm-extract esm2_t33_650M_UR50D all_spike_protein.fasta outputs/ --repr_layers 0 32 33 --include mean per_tok --nogpu) I am facing error: Traceback (most recent call last): File "/raid/home/smrutip/anaconda3/envs/genslm/bin/esm-extract", line 8, in
sys.exit(main())
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/scripts/extract.py", line 137, in main
run(args)
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/scripts/extract.py", line 88, in run
for batch_idx, (labels, strs, toks) in enumerate(data_loader):
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 266, in call
seq_encoded_list = [self.alphabet.encode(seq_str) for seq_str in seq_str_list]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 266, in
seq_encoded_list = [self.alphabet.encode(seq_str) for seq_str in seq_str_list]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 250, in encode
return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/esm/data.py", line 250, in
return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
KeyError: 'J'
Can you please tell me what is the error about and how to rectify it? @tomsercu , @joshim5, @rmrao , @naailkhan28 , @liujas000 , @nikitos9000 , @ebetica , @chloechsu , @YaoYinYing