dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
934 stars 284 forks source link

Using multiple input structures for a single sequence output #57

Open acarbn opened 1 year ago

acarbn commented 1 year ago

I try to design a sequence for a multi-state protein. It has four available conformations that are different from each other. How can I tie all residues between these four PDBs so that I can design a sequence optimised by all of four structures? I couldn't find an easy way to do this, having looked through the examples.

dauparas commented 1 year ago

You can build a tied-residue dictionary using this helper script https://github.com/dauparas/ProteinMPNN/blob/main/helper_scripts/make_pos_neg_tied_positions_dict.py (--homooligomer 1 --input_path "my_path_to_input.pdb" --output_path "my_path_to_output.jsonl")And then pass the tied position .jsonl to the main script using this flag --tied_positions_jsonl

acarbn commented 1 year ago

Thanks. It is a monomer, not an oligomer. Would I still need --homooligomer 1? I tried the following:

python $pmpnn/helper_scripts/make_pos_neg_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --homooligomer 1 --pos_neg_chain_list="A" --pos_neg_chain_betas "1.0"

It asked me --pos_neg_chain_list and --pos_neg_chain_betas flags to work, that's why I added them (also gave the same chain ID to all structures as A). Now I have a sequence for each structure but only 1 of them is really regenerated, the rest gave the same sequence as input and NaN as scores. Does it mean it considered and used all the structures yet gave only one sequence?