HannesStark / EquiBind

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein
MIT License
473 stars 109 forks source link

Inference on my own PDB throws receptor error. #9

Closed JSLJ23 closed 2 years ago

JSLJ23 commented 2 years ago

Processing BRD4: complex 1 of 1 Trying to load data/predict/BRD4/BRD4_ligand.sdf Docking the receptor data/predict/BRD4/BRD4_protein.pdb To the ligand data/predict/BRD4/BRD4_ligand.sdf Traceback (most recent call last): File "inference.py", line 471, in <module> inference_from_files(args) File "inference.py", line 339, in inference_from_files rec, rec_coords, c_alpha_coords, n_coords, c_coords = get_receptor(rec_path, lig, cutoff=dp['chain_radius']) File "/home/joshua-talo/GitHub/Python/EquiBind/commons/process_mols.py", line 373, in get_receptor c_alpha_coords = np.concatenate(valid_c_alpha_coords, axis=0) # [n_residues, 3] File "<__array_function__ internals>", line 5, in concatenate ValueError: need at least one array to concatenate

The SDF file is generated using open babel of a drug of interest which I wish to dock, while the PDB was obtained from https://www.rcsb.org/structure/3mxf where I removed the docked JQ1 molecule as well as other hetero atoms used for crystallography. For some reason this error only gets thrown when I supply an SDF generated from other tools but when I used the SDF file from the PDB link itself, with a proximity at the ligand binding site, it works to generate the predicted pose.

Looking into the source code, I think the get_receptor function is searching for receptor atoms and their coordinates within a certain proximity of the ligand SDF coordinates. If this is the case, I think it is unable to find the receptor’s alpha carbon atoms and therapy has nothing to concatenate. Am I right regarding this?

Also, may I know how I could go about doing full blind docking with SDF files of ligands generated from other tools to find novel binding sites on a protein?

Thank you.

JSLJ23 commented 2 years ago

Within process_mols.py I removed min_distance < cutoff from if min_distance < cutoff and not chain_is_water: and it sort of works... But I am not sure what this if statement is doing exactly, although I think it is somehow causing the valid_chain_ids to be empty if the SDF supplied is of a ligand that is too far away from the actual binding site. So this leads to valid_c_alpha_coords not being empty and c_alpha_coords = np.concatenate(valid_c_alpha_coords, axis=0) doesn't concatenate an empty array.

May I know what cutoff=dp['chain_radius'] is for?

HannesStark commented 2 years ago

Thanks for pointing this out and sorry for the delay.

This was a bug: When running inference using the provided weights it does not make sense to use the get_receptor function as I had it implemented previously.

The issue should be fixed now and I hope everything works when you pull!

Feel free to reopen the issue or make a new one if there are problems remaining.

JSLJ23 commented 2 years ago

Ok cool, thanks for fixing this. I am still slightly curious to the purpose of the valid chain id portion of the get_receptor function. May I know what was this used for? Was it for deciding on which protein chain within a multi chain PDB file to pay attention to?

HannesStark commented 2 years ago

Yes, we only keep the connected components of the receptor, which have an atom within a 10 A radius of any ligand atom to try and avoid symmetric complexes of multiple proteins with identical binding pockets as it is visualized in Figure 14: https://arxiv.org/abs/2202.05146