Exscientia / abodybuilder3

Apache License 2.0
40 stars 11 forks source link

plDDT score is missing #9

Closed 1326093445a closed 1 month ago

1326093445a commented 1 month ago

Hello All

First thank you for your wonderful work for the antibody modelling. I am just cruious when I ran the example using the notebook, everything worked fine, yet I did not have the plDDT score of it as of the output. So do I need to adjust the code for the plDDT output?

henrykenlay commented 1 month ago

Hi @1326093445a, I hope you find it useful :)! The plddt histogram is in the output dictionary for the AbodyBuilder3 and AbodyBuilder3-LM models. The histograms can be converted into pLDDT scores using compute_plddt from abodybuilder3.openfold.utils.loss. I made a plddt-example branch where I updated the output_to_pdb utility function to output pLDDT scores in the b-factors column, and updated the example.ipynb notebook to plot these. Let me know if you have further questions

1326093445a commented 1 month ago

Hello Henry, thank you for your reply and I will give it try now :)

1326093445a commented 1 month ago

Hello Henry and team, when I tried the plDDT example, for the LM, I kept getting th error of RuntimeError: Tensors must have same number of dimensions: got 4 and 5

I was trying to run a non-pre computing modelling using my own customized script:

import torch
from torch.utils.data import DataLoader
from abodybuilder3.language.model import ProtT5

# Set the matrix multiplication precision for Tensor Cores
torch.set_float32_matmul_precision('medium')  # Change to 'high' if needed

# Example dataset initialization
dataset = ...  # Your dataset initialization here

# Initialize DataLoader with increased number of workers
train_loader = DataLoader(dataset, batch_size=32, num_workers=23)  # Adjust batch_size as needed

# Continue with your embedding generation or loading logic
use_precomputed = False
heavy = "QVQLVQSGAEVKKPGSSVKVSCKASGGTFSSLAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARGGSVSGTLVDFDIWGQGTMVTVSS"
light = "DIQMTQSPSTLSASVGDRVTITCRASQSISSWLAWYQQKPGKAPKLLIYKASSLESGVPSRFSGSGSGTEFTLTISSLQPDDFATYYCQQYNIYPITFGGGTKVEIK"
embedding_path = "/home/yfeng17/abodybuilder3/method_validation/embedding/example.pt"

if use_precomputed:
    embedding_data = torch.load(embedding_path, weights_only=True)
    embedding = embedding_data["plm_embedding"]
    print(f"Loaded precomputed embedding from {embedding_path}")
else:
    plm = ProtT5()
    embedding = plm.get_embeddings([heavy], [light])

    if isinstance(embedding, list):
        embedding = torch.stack(embedding)

    torch.save({"plm_embedding": embedding}, embedding_path)
    print(f"Saved generated embedding to {embedding_path}")

print(f"Embedding shape: {embedding.shape}")

It gave me:

Saved generated embedding to /home/yfeng17/abodybuilder3/method_validation/embedding/example.pt
Embedding shape: torch.Size([1, 229, 1024])

Where later when I was doing:

# Part 3: Prepare input and run the model
from abodybuilder3.utils import string_to_input, add_atom37_to_output, output_to_pdb

# Prepare the input for the model
ab_input = string_to_input(heavy=heavy, light=light)
ab_input["single"] = embedding.unsqueeze(0)  # Use PLM for residue feature
ab_input_batch = {
    key: (value.unsqueeze(0).to(device) if key not in ["single", "pair"] else value.to(device))
    for key, value in ab_input.items()
}

# Run the model and generate the PDB structure
output = model(ab_input_batch, ab_input_batch["aatype"])
output = add_atom37_to_output(output, ab_input["aatype"])
pdb_string = output_to_pdb(output, ab_input)

The errors show up:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 13
      7 ab_input_batch = {
      8     key: (value.unsqueeze(0).to(device) if key not in ["single", "pair"] else value.to(device))
      9     for key, value in ab_input.items()
     10 }
     12 # Run the model and generate the PDB structure
---> 13 output = model(ab_input_batch, ab_input_batch["aatype"])
     14 output = add_atom37_to_output(output, ab_input["aatype"])
     15 pdb_string = output_to_pdb(output, ab_input)

File ~/miniconda3/envs/abodybuilder3/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File ~/miniconda3/envs/abodybuilder3/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/abodybuilder3/src/abodybuilder3/openfold/model/structure_module.py:745, in StructureModule.forward(self, evoformer_output_dict, aatype, mask, inplace_safe, _offload_inference)
    743 outputs = []
    744 for i in range(self.no_blocks):
--> 745     z = torch.cat(
    746         (
    747             z_initial,
    748             self.pairwise_distance_feature_map(rigids, mask)
    749             .unsqueeze(-1)
    750             .to(z_initial.dtype),
    751         ),
    752         dim=-1,
    753     )
    755     # [*, N, C_s]
    756     s = s + self.ipa_layers[i](
    757         s,
    758         z,
   (...)
    763         _z_reference_list=z_reference_list,
    764     )

RuntimeError: Tensors must have same number of dimensions: got 4 and 5
henrykenlay commented 1 month ago

I think the issue is that your pre-computed embedding already has a singleton batch dimension

Embedding shape: torch.Size([1, 229, 1024])

And then an additional one is added when preparing the input dictionary ab_input on this line

ab_input["single"] = embedding.unsqueeze(0)  # Use PLM for residue feature

I added the embedding for the example structure given in the notebook to the repo (data/structures/structures_plm/6yio_H0-L0.pt) on the plddt-example branch. You should be able to run the notebook on that branch without modifying it, can you give that a try?

1326093445a commented 1 month ago

Hello Henry and all other team member. I found out the reason why it did not work, the part 3 was still using pre-compute where I want it to be purely non-pre compute, thank you for your time and help :)

henrykenlay commented 1 month ago

Glad you got it working!