Anyway to dock many ligands with the same protein instead of loading the same protein everytime?

sky1ove commented 2 months ago

I'm testing a dataset of ligands against the same protein. Instead of loading the same protein pdb file everytime, is there anyway to load the pdb once, and just start docking new ligand?

tornikeo commented 2 months ago

Anything short of modifying the code a bit, won't work.

One way would be to "memoize" the function you suspect is the slowest using joblib memory cache. Basically, find the function that causes you most delay, and annotate it with that @cache decorator. AFAIK this is the simplest way to do what you want.

Let me know if this helps.

demian3b commented 2 months ago

What I do in my local is memorizing esm embeddings. Since the most expensive computation during target preprocessing is esm, so you can reduce preprocessing time much smaller. Anyway, it can be done by modifiying the code like

unique_sequences = compute_unique(list_of_protein_input)

labels, sequences = [], []
for protein_info, sequence in unique_sequences.items():
    s = sequence.split(":")
    sequences.extend(s)
    labels.extend([(*protein_info, j) for j in range(len(s))])

lm_embeddings = compute_ESM_embeddings(model, alphabet, labels, sequences)
unique_lm_embeddings = {}
for protein_info, sequence in unique_sequences.items():
     s = sequence.split(":")
     unique_lm_embeddings[protein_info] = [
         lm_embeddings[(*protein_info, j)] for j in range(len(s))
     ]

lm_embeddings = [
    unique_lm_embeddings[protein_info]
    for protein_info in list_of_protein_input
]

For caution, my code is from checkout v1.0 code

gcorso / DiffDock

Anyway to dock many ligands with the same protein instead of loading the same protein everytime? #212