Closed 1326093445a closed 1 month ago
Hi @1326093445a, I hope you find it useful :)! The plddt
histogram is in the output
dictionary for the AbodyBuilder3 and AbodyBuilder3-LM models. The histograms can be converted into pLDDT scores using compute_plddt
from abodybuilder3.openfold.utils.loss
. I made a plddt-example
branch where I updated the output_to_pdb
utility function to output pLDDT scores in the b-factors column, and updated the example.ipynb
notebook to plot these. Let me know if you have further questions
Hello Henry, thank you for your reply and I will give it try now :)
Hello Henry and team, when I tried the plDDT example, for the LM, I kept getting th error of RuntimeError: Tensors must have same number of dimensions: got 4 and 5
I was trying to run a non-pre computing modelling using my own customized script:
import torch
from torch.utils.data import DataLoader
from abodybuilder3.language.model import ProtT5
# Set the matrix multiplication precision for Tensor Cores
torch.set_float32_matmul_precision('medium') # Change to 'high' if needed
# Example dataset initialization
dataset = ... # Your dataset initialization here
# Initialize DataLoader with increased number of workers
train_loader = DataLoader(dataset, batch_size=32, num_workers=23) # Adjust batch_size as needed
# Continue with your embedding generation or loading logic
use_precomputed = False
heavy = "QVQLVQSGAEVKKPGSSVKVSCKASGGTFSSLAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARGGSVSGTLVDFDIWGQGTMVTVSS"
light = "DIQMTQSPSTLSASVGDRVTITCRASQSISSWLAWYQQKPGKAPKLLIYKASSLESGVPSRFSGSGSGTEFTLTISSLQPDDFATYYCQQYNIYPITFGGGTKVEIK"
embedding_path = "/home/yfeng17/abodybuilder3/method_validation/embedding/example.pt"
if use_precomputed:
embedding_data = torch.load(embedding_path, weights_only=True)
embedding = embedding_data["plm_embedding"]
print(f"Loaded precomputed embedding from {embedding_path}")
else:
plm = ProtT5()
embedding = plm.get_embeddings([heavy], [light])
if isinstance(embedding, list):
embedding = torch.stack(embedding)
torch.save({"plm_embedding": embedding}, embedding_path)
print(f"Saved generated embedding to {embedding_path}")
print(f"Embedding shape: {embedding.shape}")
It gave me:
Saved generated embedding to /home/yfeng17/abodybuilder3/method_validation/embedding/example.pt
Embedding shape: torch.Size([1, 229, 1024])
Where later when I was doing:
# Part 3: Prepare input and run the model
from abodybuilder3.utils import string_to_input, add_atom37_to_output, output_to_pdb
# Prepare the input for the model
ab_input = string_to_input(heavy=heavy, light=light)
ab_input["single"] = embedding.unsqueeze(0) # Use PLM for residue feature
ab_input_batch = {
key: (value.unsqueeze(0).to(device) if key not in ["single", "pair"] else value.to(device))
for key, value in ab_input.items()
}
# Run the model and generate the PDB structure
output = model(ab_input_batch, ab_input_batch["aatype"])
output = add_atom37_to_output(output, ab_input["aatype"])
pdb_string = output_to_pdb(output, ab_input)
The errors show up:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[10], line 13
7 ab_input_batch = {
8 key: (value.unsqueeze(0).to(device) if key not in ["single", "pair"] else value.to(device))
9 for key, value in ab_input.items()
10 }
12 # Run the model and generate the PDB structure
---> 13 output = model(ab_input_batch, ab_input_batch["aatype"])
14 output = add_atom37_to_output(output, ab_input["aatype"])
15 pdb_string = output_to_pdb(output, ab_input)
File ~/miniconda3/envs/abodybuilder3/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File ~/miniconda3/envs/abodybuilder3/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File ~/abodybuilder3/src/abodybuilder3/openfold/model/structure_module.py:745, in StructureModule.forward(self, evoformer_output_dict, aatype, mask, inplace_safe, _offload_inference)
743 outputs = []
744 for i in range(self.no_blocks):
--> 745 z = torch.cat(
746 (
747 z_initial,
748 self.pairwise_distance_feature_map(rigids, mask)
749 .unsqueeze(-1)
750 .to(z_initial.dtype),
751 ),
752 dim=-1,
753 )
755 # [*, N, C_s]
756 s = s + self.ipa_layers[i](
757 s,
758 z,
(...)
763 _z_reference_list=z_reference_list,
764 )
RuntimeError: Tensors must have same number of dimensions: got 4 and 5
I think the issue is that your pre-computed embedding already has a singleton batch dimension
Embedding shape: torch.Size([1, 229, 1024])
And then an additional one is added when preparing the input dictionary ab_input
on this line
ab_input["single"] = embedding.unsqueeze(0) # Use PLM for residue feature
I added the embedding for the example structure given in the notebook to the repo (data/structures/structures_plm/6yio_H0-L0.pt
) on the plddt-example
branch. You should be able to run the notebook on that branch without modifying it, can you give that a try?
Hello Henry and all other team member. I found out the reason why it did not work, the part 3 was still using pre-compute where I want it to be purely non-pre compute, thank you for your time and help :)
Glad you got it working!
Hello All
First thank you for your wonderful work for the antibody modelling. I am just cruious when I ran the example using the notebook, everything worked fine, yet I did not have the plDDT score of it as of the output. So do I need to adjust the code for the plDDT output?