When running zero-shot variant prediction using msa1b with the codes provided in examples/variant-prediction, I came across the following error:
File "predict.py", line 180, in <lambda> lambda row: label_row( File "predict.py", line 114, in label_row score = token_probs[0, 1 + idx, mt_encoded] - token_probs[0, 1 + idx, wt_encoded] IndexError: index 216 is out of bounds for dimension 1 with size 216
the code I use is as followed:
python predict.py --model-location esm_msa1b_t12_100M_UR50S --sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv --mutation-col mutant --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv --offset-idx 1 --scoring-strategy masked-marginals --msa-path ./data/MSA/trial_BLAT.a2m
I use the entire BLAT_ECOLX sequences of 286aa as the input sequence, and all the entries in my .a2m file are of the same length. I also set the -offset-idx to 1, but it doesn't seem to work. I print out the dimension of the batch_tokens and the token_probs in predict.py and find the size which I think represents the length of the protein sequence is 216 while it should be 286 in this case.
Other proteins of different length were also tested, but the dimensions never match. Am i understanding the dimensions of the token_probs wrong?
Besides, running the demonstration codes under examples/variant-prediction with data provided in this directory results in error
RuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
code:
python predict.py \ --model-location esm_msa1b_t12_100M_UR50S \ --sequence HPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW \ --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv \ --mutation-col mutant \ --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv \ --offset-idx 24 \ --scoring-strategy masked-marginals \ --msa-path ./data/BLAT_ECOLX_1_b0.5.a3m
When running zero-shot variant prediction using msa1b with the codes provided in examples/variant-prediction, I came across the following error:
File "predict.py", line 180, in <lambda> lambda row: label_row( File "predict.py", line 114, in label_row score = token_probs[0, 1 + idx, mt_encoded] - token_probs[0, 1 + idx, wt_encoded] IndexError: index 216 is out of bounds for dimension 1 with size 216
the code I use is as followed:python predict.py --model-location esm_msa1b_t12_100M_UR50S --sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv --mutation-col mutant --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv --offset-idx 1 --scoring-strategy masked-marginals --msa-path ./data/MSA/trial_BLAT.a2m
I use the entire BLAT_ECOLX sequences of 286aa as the input sequence, and all the entries in my .a2m file are of the same length. I also set the -offset-idx to 1, but it doesn't seem to work. I print out the dimension of the batch_tokens and the token_probs in predict.py and find the size which I think represents the length of the protein sequence is 216 while it should be 286 in this case. Other proteins of different length were also tested, but the dimensions never match. Am i understanding the dimensions of the token_probs wrong? Besides, running the demonstration codes under examples/variant-prediction with data provided in this directory results in errorRuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
code:python predict.py \ --model-location esm_msa1b_t12_100M_UR50S \ --sequence HPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW \ --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv \ --mutation-col mutant \ --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv \ --offset-idx 24 \ --scoring-strategy masked-marginals \ --msa-path ./data/BLAT_ECOLX_1_b0.5.a3m