HannesStark / EquiBind

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein
MIT License
479 stars 109 forks source link

Could you provide QuickVina2 and Smina finetuning script? #15

Closed lkfo415579 closed 2 years ago

lkfo415579 commented 2 years ago

To my understanding, Smina will first randomly place the ligand in the pocket and do Monte Carlo searching based on energy function. In another word, the result of EQUIBINDS should be as same as Smina, but the result of the paper showed there is a huge improvement from 11.6->19.6.

Could you provide QuickVina2 and Smina finetuning script? I am curious.


It is my guess below. Your experiment pipline should be rigid_redocking model first, then using smina to do the finetuning. In this pipeline, actually maybe because of the ligand's initial conformer is exactly the one in the co-crystal ligand pose (rotatable bond is correct), and the Smina result(11.6) is only using rdkit conformer (rotatable bond may not be correct), finally, it causes different result. The different rotatable bond of ligand will casue very different docking results while using traditional docking software.

HannesStark commented 2 years ago

Hi and thanks for the issue!

The numbers of SMINA and EquiBindS are different because SMINA searches in the bounding box as defined by the whole receptor. Meanwhile, EquiBindS only runs SMINA in a small bounding box around the ligand that was predicted by EquiBind-R. EquiBind-R only receives the RDKit ligand as input and SMINA directly uses the output of EquiBind-R.

The numbers of EquiBindS are better than those of SMINA since EquiBind-R often provides a correct approximate location for where to search with the smaller bounding box.

So it is correct that the experiment pipeline uses the rigid_docking model first, but this is with an RDKit conformer as input during inference and not with the co-crystallized ligand pose.

I hope this clarifies everything. We will describe this more clearly in the next version of the paper - thanks for pointing it out!


Here is the code that we use for EquiBindQ. For EquiBindS it is similar, but QickVina2 is replaced with SMINA.


start_time = time.time()
data_path = 'equibind_output'
names = sorted(os.listdir(data_path))
exhaustiveness = 8
radius = 5
num_cpu = 16
tune_from = 'lig_eb'

rdkit_coords = True
if rdkit_coords == True:
    rdstring = '_rdkitcoords'
else:
    rdstring = ''

output_name = f'lig_qvina2_{tune_from}_ex{exhaustiveness}{rdstring}_r{radius}'

if not os.path.exists(output_name):
    os.mkdir(output_name)
all_times = []
for i, name in enumerate(names):
    single_time = time.time()

    # load coordinates to get bounding box
    path = os.path.join(data_path, name, f'{tune_from}.pdb')
    lig = PandasPdb().read_pdb(path)
    rec_df = lig.df['HETATM']
    rec_coords = rec_df[['x_coord', 'y_coord', 'z_coord']].to_numpy().squeeze().astype(np.float32)
    center_x = (rec_coords[:, 0].max() + rec_coords[:, 0].min()) / 2
    size_x = (rec_coords[:, 0].max() - rec_coords[:, 0].min()) + radius
    center_y = (rec_coords[:, 1].max() + rec_coords[:, 1].min()) / 2
    size_y = (rec_coords[:, 1].max() - rec_coords[:, 1].min()) + radius
    center_z = (rec_coords[:, 2].max() + rec_coords[:, 2].min()) / 2
    size_z = (rec_coords[:, 2].max() - rec_coords[:, 2].min()) + radius
    if not os.path.exists(f'{output_name}/{name}'):
        os.mkdir(f'{output_name}/{name}')
    # call qvina2 wide to find binding pose
    rec_path = os.path.join(data_path, name, 'rec.pdbqt')
    lig_path = os.path.join(data_path, name, f'{tune_from}.pdbqt')
    out_path = os.path.join(f'{output_name}', name,
                            f'{output_name}.pdbqt')
    log_path = os.path.join(f'{output_name}', name,
                            f'{output_name}.log')

    return_code = subprocess.run(
        f"~/qvina2/qvina02 --receptor {rec_path} --ligand {lig_path} --num_modes 1 --center_x {center_x} --center_y {center_y} --center_z {center_z} --size_x {size_x} --size_y {size_y} --size_z {size_z} --out {out_path} --log {log_path} --cpu {num_cpu} --exhaustiveness {exhaustiveness}",
        shell=True)

    all_times.append(time.time() - single_time)

    print("single time: --- %s seconds ---" % (time.time() - single_time))
    print("time so far: --- %s seconds ---" % (time.time() - start_time))
    print('\n')
print(all_times)
print("--- %s seconds ---" % (time.time() - start_time))```
lkfo415579 commented 2 years ago

Thank you for quick reply, appreciate it. By the way this appoarch may be the key to unlock binding affinity prediction end to end DNN training challenge, did you guys think about it before?

HannesStark commented 2 years ago

I assume that things are clarified and am closing the issue.

HannesStark commented 2 years ago

We are considering different approaches for affinity prediction based on EquiBind and integrating affinity information into EquiBind. Let me know if that is not what you meant!