generatebio / chroma

A generative model for programmable protein design
Apache License 2.0
696 stars 90 forks source link

seeking advices for conditional binder design #9

Closed vuhongai closed 11 months ago

vuhongai commented 1 year ago

Thanks chroma authors for open-sourcing your amazing work.

I am trying to use chroma to modify only 10aa of a previous binder, given a receptor. What I want to do in the sampling phase is sample simultaneously both the docking of binder on receptor's surface and new backbone of 10 modified aa.

What I've been trying is like that:

My problem is that if I do like that, the coordinates of both receptor and binder remain unchanged, therefore no sampling of docking happens. Can you please suggest me how I can do that? Thank you for your help in advance.

Ai

protein_1 = Protein(".complex.pdb", device=device) #receptor in chain B, binder in chain A
X, C, S = protein.to_XCS()

L_binder = (C == 2).sum().item()
L_receptor = (C == 1).sum().item()
L_complex = L_binder+L_receptor

modify_AAs = [i for i in range(321,333)] #indexes of aa being modified

# keep original seqs of unmodified aa by providing the mask
mask_aa = torch.Tensor(L_complex * [[1] * 20])
for i in range(L_complex):
    if i not in modify_AAs:
        mask_aa[i] = torch.Tensor([0] * 20)
        mask_aa[i][S[0][i].item()] = 1
mask_aa = mask_aa[None].cuda()

residues_to_keep_R = [i for i in range(L_receptor)]
protein.sys.save_selection(gti=residues_to_keep_R, selname="receptor")
conditioner_struc_R = conditioners.SubstructureConditioner(
        protein,
        backbone_model=chroma.backbone_network,
        selection = 'namesel receptor').to(device)

residues_to_keep_B = [i for i in range(L_receptor,L_complex) if i not in modify_AAs]
protein.sys.save_selection(gti=residues_to_keep_B, selname="binder")
conditioner_struc_B = conditioners.SubstructureConditioner(
        protein,
        backbone_model=chroma.backbone_network,
        selection = 'namesel binder', gamma=0.5).to(device)

conditioner = conditioners.ComposedConditioner([conditioner_struc_R, conditioner_struc_B, ])

protein, trajectories = chroma.sample(
    protein_init=protein,
    conditioner=conditioner,
    design_selection = mask_aa,
    langevin_factor=2,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    sde_func='langevin',
    full_output=True,
    steps=500,
)

protein.to("sample.cif")
wujiewang commented 1 year ago

Thanks for the interest! Based on your code, what you see is expected because substructure conditioning freezes the coordinate and roto-translational motion. It is not designed to be used as a sampler for docking.

kushnarang commented 1 year ago

I adapted some of your code, and combined it with some of mine. I'm not sure if this is exactly what you want, but this is working for me for now.

The binder gets re-designed and re-positioned on the receptor (not sure if the re-positioning is "docking" per-say, my familiarity with the terminology in this field is still limited). The receptor is locked in place and it's structure is fixed.

Changing inverse_temperature and langevin_factor to lower values seem to keep the binder closer to the original position. I'm not sure about how to forcefully specify a binding position (re:#14). Taking the temperature too low seems to remove all secondary structure formation, though. Still trying to ascertain what changing gamma on the conditioner does in practice.

X, C, S = protein.to_XCS()

L_binder = (C == 1).sum().item()
L_receptor = (C == 2).sum().item()
L_complex = L_binder + L_receptor

# Use to show you which chain has which code:
# import numpy as np
# np.unique(C.numpy(force=True), return_counts=True)

# In my case, the binder is 1, and the receptor is 2
residues_to_keep = (C == 2).nonzero(as_tuple=True)[1].tolist()
residues_to_design = (C == 1).nonzero(as_tuple=True)[1].tolist()

# keep original seqs of unmodified aa by providing the mask
mask_aa = torch.Tensor(L_complex * [[1] * 20])
for i in range(L_complex):
    if i not in residues_to_design:
        mask_aa[i] = torch.Tensor([0] * 20)
        mask_aa[i][S[0][i].item()] = 1
mask_aa = mask_aa[None].cuda()

protein.sys.save_selection(gti=residues_to_keep, selname="receptor")
conditioner_struc_R = conditioners.SubstructureConditioner(
        protein,
        # gamma=1,
        backbone_model=chroma.backbone_network,
        selection = 'namesel receptor').to(device)

conditioner = conditioners.ComposedConditioner([ conditioner_struc_R ])

protein_out, trajectories = chroma.sample(
    protein_init=protein,
    conditioner=conditioner,
    design_selection=mask_aa,
    langevin_factor=1,
    langevin_isothermal=True,
    inverse_temperature=2,
    sde_func='langevin',
    full_output=True,
    steps=500,
)

display(protein_out)
wujiewang commented 11 months ago

Thanks for all the discussions and feedbacks! I will close this issue for now, feel free to reopen or post new issues if you feel the need.