JudeWells / chainsaw

MIT License
27 stars 2 forks source link

Residues with non-sequential index #30

Closed sillitoe closed 10 months ago

sillitoe commented 10 months ago

We've run into an error ValueError: Index 70 not in model_res_label_by_index when attempting to make a prediction for the PDB file 5yclA

10/16/2023 03:21:54 PM | INFO | Making prediction for file 5yclA.pdb (chain '5yclA')
10/16/2023 03:21:54 PM | WARNING | No chain specified for /scratch0/nbordin/chainsaw-3158440-9/holdingpen_008/pdb/5yclA.pdb, using first chain
10/16/2023 03:21:54 PM | INFO | Running command: /SAN/orengolab/af_esm/tools/chainsaw/stride/stride /scratch0/nbordin/chainsaw-3158440-9/holdingpen_008/pdb/5yclA_renum.pdb -rA
10/16/2023 03:21:54 PM | INFO | Distance matrix shape: (1, 131, 131), SS matrix shape: (131, 131)
10/16/2023 03:21:55 PM | INFO | Segments (index to label): ['4-68'] -> ['7-71']
Traceback (most recent call last):
  File "get_predictions.py", line 332, in <module>
    main(parse_args())
  File "get_predictions.py", line 264, in main
    result = predict(model, pdb_path, ss_mod=args.ss_mod, pdbchain=pdb_chain_id)
  File "get_predictions.py", line 175, in predict
    segs_str = [f"{seg.start_label}-{seg.end_label}" for seg in dom.segs]
  File "get_predictions.py", line 175, in <listcomp>
    segs_str = [f"{seg.start_label}-{seg.end_label}" for seg in dom.segs]
  File "get_predictions.py", line 143, in start_label
    return self.res_label_of_index(self.start_index)
  File "get_predictions.py", line 138, in res_label_of_index
    raise ValueError(f"Index {index} not in model_res_label_by_index ({model_res_label_by_index})")
ValueError: Index 70 not in model_res_label_by_index ({1: '4', 2: '5', 3: '6', 4: '7', 5: '8', 6: '9', 7: '10', 8: '11', 9: '12', 10: '13', 11: '14', 12: '15', 13: '16', 14: '17', 15: '18', 16: '19', 17: '20', 18: '21', 19: '22', 20: '23', 21: '24', 22: '25', 23: '26', 24: '27', 25: '28', 26: '29', 27: '30', 28: '31', 29: '32', 30: '33', 31: '34', 32: '35', 33: '36', 34: '37', 35: '38', 36: '39', 37: '40', 38: '41', 39: '42', 40: '43', 41: '44', 42: '45', 43: '46', 44: '47', 45: '48', 46: '49', 47: '50', 48: '51', 49: '52', 50: '53', 51: '54', 52: '55', 53: '56', 54: '57', 55: '58', 56: '59', 57: '60', 58: '61', 59: '62', 60: '63', 61: '64', 62: '65', 63: '66', 64: '67', 65: '68', 66: '69', 67: '70', 68: '71', 69: '72', 71: '74', 72: '75', 73: '76', 75: '78', 76: '79', 77: '80', 78: '81', 79: '82', 80: '83', 81: '84', 82: '85', 83: '86', 84: '87', 86: '89', 87: '90', 88: '91', 89: '92', 90: '93', 91: '94', 92: '95', 93: '96', 94: '97', 95: '98', 96: '99', 97: '100', 98: '101', 99: '102', 100: '103', 101: '104', 102: '105', 104: '107', 105: '108', 106: '109', 107: '110', 108: '111', 109: '112', 110: '113', 111: '117', 112: '118', 113: '119', 114: '120', 115: '121', 116: '122', 117: '123', 118: '124', 119: '125', 120: '126', 121: '127', 122: '128', 123: '129', 124: '130', 125: '131', 126: '132', 127: '133', 128: '134', 129: '135', 130: '136', 131: '137', 132: '138', 133: '139', 134: '140', 135: '141'})

Note: the index skips from 69 to 71.

It seems likely that this is due to the index being created before the residues are filtered for non-standard amino acids. If so, the ideal solution would be for the index to be sequential with respect to the final, filtered sequence, while maintaining the integrity of the PDB labels.