dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
910 stars 278 forks source link

Tying Multiple Chains Independently #76

Closed brucejwittmann closed 9 months ago

brucejwittmann commented 9 months ago

Say I have a hetero-oligomer--for instance, chains A, B, and C, share the same sequence, and chains D, E, and F share a different sequence--is there a way that I can tie the residues in A, B, and C together while independently tying the residues of D, E, and F together? I was looking at submit_example_5.sh, but that seems to suggest it's only possible to tie positions together for homo-oligomers.

dauparas commented 9 months ago

Yes, have a look at this helper script -https://github.com/dauparas/ProteinMPNN/blob/main/helper_scripts/make_tied_positions_dict.py You can pass your own custom dictionary via--tied_positions_jsonl flag.


#One dictionary inside the list specifies one symmetry between residues, the example below has two dictionaries which would tied residues
#A:1, B:1, C:1 together and also independently tie C:1, D:1, E:1 together. 
{"my_protein_name": [{"A": [1], "B": [1], "C": [1]}, {"D": [1], "E": [1], "F": [1]}]}
#One can also specify multiple residues from the same chain, e.g. 
{"my_protein_name": [{"A": [1,4,9], "B": [1], "C": [1]}]}
#in this case residues A:1,4,9, B:1, C:1 will be tied together.```
brucejwittmann commented 9 months ago

Thanks, Justas! Just to make sure I've got it correct, if I want to make sure that chains A and B are identical and chains C and D are identical after sampling, would my dict look like Option 1 or Option 2?

Option 1: {"my_protein_name": [{"A": [1], "B": [1]}, {"C": [1], "D": [1]}, {"A": [2], "B": [2]}, {"C": [2], "D": [2]}...]}

Option 2: {"my_protein_name": [{"A": [1, 2, ...], "B": [1, 2, ...]}, {"C": [1, 2, ...], "D": [1, 2, ...]}]}

It looks like it would be Option 1 based on the example in example_5_outputs/tied_pdbs.jsonl.

And I assume that you mean the dictionary would be passed as a json to --tied_positions_jsonl of the protein_mpnn_run.py script?

dauparas commented 9 months ago

Option 1 is what you want. Option 2 would make all residues in chain A and chain B (same for chain C and D) to be just one residue since everything is tied together, i.e chain A seq. = "GGGG", chain B seq. = "GGGG". Yes, pass your dictionary via --tied_positions_jsonl of the protein_mpnn_run.py script.

brucejwittmann commented 9 months ago

Perfect. Thanks for your help! Closing this :)