RosettaCommons / RFdiffusion

Code for running RFdiffusion
Other
1.64k stars 316 forks source link

Why does the output PDB designs only contain Glycines?? #185

Open Hanoriega opened 7 months ago

Hanoriega commented 7 months ago

Hello,

Thank you for creating this wonderful software; I foresee great success! I am trying to run a simple unconditional monomer from my PDB file using the README.md tutorials. So I have my environment set up and the PDB file (an AAV monomer) in the path I want it, this is my command:

(SE3nv) C:\my\Path\RFdiffusion>python .\scripts\run_inference.py "contigmap.contigs=[534-534]" inference.input_pdb=C:\my\Path\RFdiffusion\sample.pdb inference.output_prefix=test_outputs\test inference.num_designs=10

When I run this, all 10 designs come out with a sequence of only Glycine and do not look anything like an AAV monomer. What am I doing wrong or not understanding? Please help me understand.

Thanks,

Heather

Also, disclosure: I am relatively new to computer science and not an avid coder, so please be gentle and break down explanations if possible.

roccomoretti commented 7 months ago

RFDiffusion was only trained to design/predict the backbone conformation of proteins. It operates on the backbone level, without (explicit) consideration of sidechains. As such it outputs "backbone only" structures, which for usability are annotated as GLY.

If you're interested in placing amino acid sequences onto the backbone, you can take the output of RFDiffusion and pass it through ProteinMPNN.

Also, the normal operation of RFDiffusion is to design backbone structures de novo, with the only considerations being their length and generalized protein-likeness. It can do partial regeneration (e.g. redoing loops) or be "conditioned" on certain features (like secondary structure or protein contacts), but those require additional settings to get working properly. I'd recommend working through the examples in the examples directory to get a better sense of things. -- I get the impression, though, that you're looking for fold conditioning which has its own section in the README as well as the examples.

Hanoriega commented 7 months ago

@roccomoretti Thank you for your response! I was just trying to make sure my software clone and environment were working properly, so I did the unconditional as a tutorial. I do not remember reading that the usability is annotated as glycine, so I asked. I input it to ProteinMPNN, and it gave me sequences, so I ran through AlphaFold2. They look similar to the RFdiffusion PDB output. So, for my purpose now, I need to work on understanding conditional.

I am interested in the motif scaffolding to create DNA binding motifs on the backbone and the 60-mer capsid that Will Sheffler worked on from the presentation to confirm the capsid structures of a similar family.