JudeWells / chainsaw

MIT License
27 stars 2 forks source link

Add chain specifier #23

Closed bordin89 closed 10 months ago

bordin89 commented 11 months ago

Currently Chainsaw has Chain A hardcoded as AlphaFold models only have the A chain.

def predict(model, pdb_path, renumber_pdbs=True, ss_mod=False, pdbchain="A") -> List[PredictionResult]:
    """
    Makes the prediction and returns a list of PredictionResult objects
    """
    start = time.time()

    # get model structure metadata
    model_structure = featurisers.get_model_structure(pdb_path)
    model_structure_seq = featurisers.get_model_structure_sequence(model_structure, chain=pdbchain)
    model_structure_md5 = hashlib.md5(model_structure_seq.encode('utf-8')).hexdigest()

If we want to make this generalisable to PDB files that have multiple chains (i.e. 4wgvC), we should be able to specify the chain to avoid issues like

Traceback (most recent call last):
  File "get_predictions.py", line 292, in <module>
    main(parse_args())
  File "get_predictions.py", line 223, in main
    result = predict(model, pdb_path, ss_mod=args.ss_mod)
  File "get_predictions.py", line 105, in predict
    model_structure_seq = featurisers.get_model_structure_sequence(model_structure, chain=pdbchain)
  File "/SAN/orengolab/af_esm/tools/chainsaw/src/featurisers.py", line 34, in get_model_structure_sequence
    residues = [c for c in structure_model[chain].child_list]
  File "/SAN/orengolab/af_esm/tools/chainsaw/venv/lib/python3.8/site-packages/Bio/PDB/Entity.py", line 45, in __getitem__
    return self.child_dict[id]
KeyError: 'A'