RosettaCommons / protein_generator

Joint sequence and structure generation with RoseTTAFold sequence space diffusion
https://huggingface.co/spaces/merle/PROTEIN_GENERATOR
MIT License
263 stars 43 forks source link

Is there a way to fix residues at defined positions in binder design #7

Closed amin-sagar closed 1 year ago

amin-sagar commented 1 year ago

Hello. Thanks for this awesome work. I am wondering if there is a way to fix certain residues at defined positions in the sequence for binder design. For example, I know my binder should have an F and a W at positions 6 and 9 in a 18 residue peptide. I tried

python /home/amin/softwares/protein_generator/inference.py \
    --num_designs 50 \
    --out output3/binder_design \
    --pdb Rec-Pep.pdb \
    --T 25 --save_best_plddt \
    --strand_bias 0.1 \
    --contigs A1-254,0 15 \
    --aa_spec XXXXXFXXWXXXXXXXXX
    --hotspots A79,A93,A97,A100,A163,A199,A234,A235,A236,A237,A238,A241,A242,A245,A246,A249

I would really appreciate any suggestions. Best, Amin.

Lyang556 commented 1 year ago

@amin-sagar You can use the dl_binder_design (https://github.com/nrbennet/dl_binder_design)

amin-sagar commented 1 year ago

Thanks @Lyang556 But dl_binder_design seems to follow a different method. And, I don't have a license for Rosetta. If this is not currently possible with proteingenerator, I can try to write the code for it. Could you give me some pointers on how to approach this?

0merle0 commented 1 year ago

Hi sorry just seeing this now! Are you looking to fix sequence and structure for these residues or just sequence alone?

A hacky way to get around this is to add these residues into your pdb file and then specify them in the conitgs, alternatively I can also write some code to parse the sequence input to be able to specify certain residues in the binder.

amin-sagar commented 1 year ago

Thanks @0merle0 I would like to fix just the sequence. Adding these residues to the PDB and specifying them in the contigs could be a nice way of doing it when starting from a known binder with a structure. But, I am looking to generate new binders which have some residues at defined positions because I just know the binding motif from the experiments but not the structure. In ColabDesign, this is done by specifying an N*20 matrix which inputs the probability of each amino acid at a position. This allows some residues to be fixed by specifying a very large number in this matrix. Also, undesired residues can be disallowed by having negative values. Could a similar strategy work here?

0merle0 commented 1 year ago

Hey sorry this took me a little while to get to, busy week! But I made the fix so it should now be possible to specify contigs and a sequence input simultaneously, the sequence will get merged in with the contig representation as long as you specify a fixed length contig and sequence of the same length, any residue not specified as a mask token in the sequence will be overwritten with the specified amino acid at that position. This is an example input: ./inference.py --num_designs 10 --out examples/out/design --pdb examples/pdbs/rsv5_5tpn.pdb --contigs 25,A163-181,25 --T 25 --save_best_plddt --sequence XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGGGGGGGG

amin-sagar commented 1 year ago

Thanks @0merle0 It works perfectly.