RosettaCommons / RFdiffusion

Code for running RFdiffusion
Other
1.8k stars 352 forks source link

Deep Learning Binder Results #94

Open okanders opened 1 year ago

okanders commented 1 year ago

Hi @joewatchwell , I was attempting to design insulin binders of my own (following the example specifications) and then running the backbone through MPNN_FR and AF2 initial guess. I compared a potential binder to a benchmark insulin binder from the supplement material in Improving de novo protein binder design with deep learning, and even though the target template was identical, I found confusing results:

InsulinR_mb: {'plddt_total': 95.02760208110635, 'plddt_binder': 91.0824370734358, 'plddt_target': 96.7371735844303, 'pae_binder': 2.7015252, 'pae_target': 2.4785695, 'pae_interaction': 4.80579948425293, 'time': 146.24850199604407}

design_ppi_scaffolded_6_dldesign_4: {'plddt_total': 52.28481075101907, 'plddt_binder': 94.86918808372161, 'plddt_target': 33.83158057351464, 'pae_binder': 1.7483437, 'pae_target': 19.690062, 'pae_interaction': 26.69976043701172, 'time': 16.96811721706763}

I swapped the target pdb structure of Insulin R_mb (chain B) into for the target structure of design_ppi_scaffolded_6_dldesign_4, and received the expected results. It seems that the RFdiffusion cleans a great amount of detail in terms of side chains on the target structure when submitted to make a binder. Is there a way to maintain the nuance of the target structure? Thanks and would appreciate the help!

design_ppi_scaffolded_6_dldesign_4.pdb.zip

InsulinR_mb.pdb.zip

nrbennet commented 1 year ago

Can you send the output structure from RFdiffusion and the output structure from MPNN-FR?

okanders commented 1 year ago

@nrbennet, Hi, The pdb I sent above was the output of MPNN (I just MPNN used). Here are my outputs from RFDiffusion and the AF2 initial guess from pdb_interfaceAF2.py:

InsulinDesign_RFDiff_a2pred.zip

Thanks so much for the help!

nrbennet commented 1 year ago

Can you send the command you used to generate the diffusion designs?

okanders commented 1 year ago

@nrbennet, yes of course. I made some edits to the dl_interface_design.py, but none that should have effected the target sequence.

RFDiffusion (example provided as script):

../scripts/run_inference.py scaffoldguided.target_path=input_pdbs/insulin_target.pdb inference.output_prefix=example_outputs/ppi_test/design_ppi_scaffolded scaffoldguided.scaffoldguided=True 'ppi.hotspot_res=[A59,A83,A91]' scaffoldguided.target_pdb=True scaffoldguided.target_ss=target_folds/insulin_target_ss.pt scaffoldguided.target_adj=target_folds/insulin_target_adj.pt scaffoldguided.scaffold_dir=./ppi_scaffolds/ inference.num_designs=10 denoiser.noise_scale_ca=0 denoiser.noise_scale_frame=0

MPNN: python dl_interface_design.py -pdbdir ../input/proteinmpnn/insulin -outpdbdir ../output/proteinmpnn/insulin -seqs_per_struct 5 I edited the dl_interface_design file to take in a pdb, the pdb construction at the end seems fine.

AF2 python pdb_interfaceAF2predict.py -pdb_dir ../output/proteinmpnn/insulin I use the pdb interface, as I am not using silent files.

nrbennet commented 1 year ago

The outputs of RFdiffusion look fine but they are getting messed up in MPNN. Can you try using the latest version of the dl_binder_design scripts? I have modified them to where they can handle silent files or pdb files so you can just use the scripts out of the box without modifying them.

okanders commented 1 year ago

@nrbennet Unfortunately, I do not have access to pyrosetta... I am trying to look into a Biopython substitute for this MPNN sequencing if that is possible (is this translation to Biopython or another toolkit possible as I don't think that the sequence encoding is dependent on Pose functions?)? Thanks so much and truly appreciate all the help!

nrbennet commented 1 year ago

It's possible to write the scripts in Biopython, it will just require some work. You will need to go through the script and do all of the chain parsing with Biopython. This is for sure the issue with your results though.

If you're an academic you should be able to get a free license to PyRosetta and this would save you quite a bit of time since you can just use all of the tools out of the box

okanders commented 1 year ago

@nrbennet I was wondering what is wrong with my MPNN output? The target has the same sequence and coordinates as from the RFDiffusion output. Do you modify the target, which is just backbone at all (as my plddt target is low)?

How do you handle the generating and threading of the sequence...I don't believe you impact the target structure?

design_ppi_scaffolded_6_dldesign_4.pdb.zip

okanders commented 1 year ago

@nrbennet ,

Hi Nathaniel, I was wondering what your PDB file looks like after ProteinMPNN, as mine is just backbone + encoded sequence, which does not seem to be enough to display an accurate target?

nrbennet commented 1 year ago

mpnn_example.zip

Here is an example of what the structure looks like after MPNN. The script does an automatic cycle of Rosetta FastRelax so the side chains will not be clashing here.