Open sky1ove opened 2 months ago
Anything short of modifying the code a bit, won't work.
One way would be to "memoize" the function you suspect is the slowest using joblib memory cache. Basically, find the function that causes you most delay, and annotate it with that @cache
decorator. AFAIK this is the simplest way to do what you want.
Let me know if this helps.
What I do in my local is memorizing esm embeddings. Since the most expensive computation during target preprocessing is esm, so you can reduce preprocessing time much smaller. Anyway, it can be done by modifiying the code like
unique_sequences = compute_unique(list_of_protein_input)
labels, sequences = [], []
for protein_info, sequence in unique_sequences.items():
s = sequence.split(":")
sequences.extend(s)
labels.extend([(*protein_info, j) for j in range(len(s))])
lm_embeddings = compute_ESM_embeddings(model, alphabet, labels, sequences)
unique_lm_embeddings = {}
for protein_info, sequence in unique_sequences.items():
s = sequence.split(":")
unique_lm_embeddings[protein_info] = [
lm_embeddings[(*protein_info, j)] for j in range(len(s))
]
lm_embeddings = [
unique_lm_embeddings[protein_info]
for protein_info in list_of_protein_input
]
For caution, my code is from checkout v1.0 code
I'm testing a dataset of ligands against the same protein. Instead of loading the same protein pdb file everytime, is there anyway to load the pdb once, and just start docking new ligand?