BytedProtein / ByProt

Apache License 2.0
141 stars 13 forks source link

Inpaint for multichain #23

Open BSharmi opened 1 month ago

BSharmi commented 1 month ago

Hi there!

How do I do inpainting on a multichain protein? For e.g. I want to make sure that all 3 chains in a trimer get the same changes. In proteinMPNN there is an option of tied_positions but I did not see that in LMDesign (proteinmonn_cmlm) modules. From examples it seems designer.set_structure() is same for single and multi chain and we do not need to add chain_list=["A", "B", "C"], masked_chain_list=["A"] but later on I get an error from multichain.py in datasets about key error onmasked_chains = b['masked_list']

I am guessing the featurizer should be different but not sure how. Would love if someone can help!

Thank you!

BSharmi commented 1 month ago

Also if I test the code snippet under examples

# multi-chain complex
pdb_path = "/root/research/projects/ByProt_public/examples/3uat.pdb"

print(f"designed by cath-trained LM-Design")
designer_cath.set_structure(pdb_path)
print(designer_cath.generate()[0]); designer_cath.calculate_metrics()

print(f"designed by pdb complex-trained LM-Design")
designer_complex.set_structure(
    pdb_path
    # chain_list=['A', 'B'] -> load which chains
    # masked_chain_list=['A'] -> which chains to predict while the remaining chains serve as conditioning
)
print(designer_complex.generate()[0]); designer_complex.calculate_metrics()

I get the same error KeyError: 'masked_list' from trying to run masked_chains = b['masked_list'] in /ByProt/src/byprot/datamodules/datasets/multichain.py