gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
976 stars 238 forks source link

Anyway to dock many ligands with the same protein instead of loading the same protein everytime? #212

Open sky1ove opened 2 months ago

sky1ove commented 2 months ago

I'm testing a dataset of ligands against the same protein. Instead of loading the same protein pdb file everytime, is there anyway to load the pdb once, and just start docking new ligand?

tornikeo commented 2 months ago

Anything short of modifying the code a bit, won't work.

One way would be to "memoize" the function you suspect is the slowest using joblib memory cache. Basically, find the function that causes you most delay, and annotate it with that @cache decorator. AFAIK this is the simplest way to do what you want.

Let me know if this helps.

demian3b commented 2 months ago

What I do in my local is memorizing esm embeddings. Since the most expensive computation during target preprocessing is esm, so you can reduce preprocessing time much smaller. Anyway, it can be done by modifiying the code like

unique_sequences = compute_unique(list_of_protein_input)

labels, sequences = [], []
for protein_info, sequence in unique_sequences.items():
    s = sequence.split(":")
    sequences.extend(s)
    labels.extend([(*protein_info, j) for j in range(len(s))])

lm_embeddings = compute_ESM_embeddings(model, alphabet, labels, sequences)
unique_lm_embeddings = {}
for protein_info, sequence in unique_sequences.items():
     s = sequence.split(":")
     unique_lm_embeddings[protein_info] = [
         lm_embeddings[(*protein_info, j)] for j in range(len(s))
     ]

lm_embeddings = [
    unique_lm_embeddings[protein_info]
    for protein_info in list_of_protein_input
]

For caution, my code is from checkout v1.0 code