generatebio / chroma

A generative model for programmable protein design
Apache License 2.0
696 stars 90 forks source link

Doc on design_selection in chroma.design #5

Closed v-shaoningli closed 1 year ago

v-shaoningli commented 1 year ago

Hi all, thanks for your work Chroma. Could u please provide a description about design_selection in chroma.design (rules or something)? The source code is a little bit complex to understand. Thanks for your help.

Best, Shaoning.

wujiewang commented 1 year ago

Thanks for your interest! We should improve doc on this. The more straightforward way of imposing a mask can be bone by provide a binary mask with shape (num_batch, num_residues). For example,

design_mask = torch.Tensor([0] * 25 + [1] * 25)[None].cuda()
protein = chroma.sample(chain_lengths=[50], design_selection=design_mask)
print( protein.sequence() )

output:

AAAAAAAAAAAAAAAAAAAAAAAAAIGPDNTRESVYWKMLSQQARAAAAA

As you see the first 25 residues are dummy residues (Ala), and the rest 25 residues are designed.

Alternatively, you can also provide a selection string, and it is indeed not very intuitive. You can see some examples here in the unit tests. We will try to figure out a way to describe the grammar better.

wujiewang commented 1 year ago

You also provide a mask with dimension (num_batch, num_residues, 20). In this way, you can specify allowable amino acid for design, e.g.

 [[[0, 1, 1 ,.... , 1],
  [1,  1, 1 ,.... , 1]]]
... ... ...

This means that Ala is not allowed at residue position 1. For the canonical amino acid alphabet order, see here

v-shaoningli commented 1 year ago

Inputting design_mask is more straightforward than the expression. Really thanks for your help.