RosettaCommons / RFdiffusion

Code for running RFdiffusion
Other
1.78k stars 348 forks source link

Partial diffusion errors #214

Open ibalafkir opened 7 months ago

ibalafkir commented 7 months ago

Hello, I've bumped into a contig error when doing partial diffusion of a PDB composed of chains 'A' and 'B'

When I want to keep fixed chain A but diffuse some residues in chain B by using this contigs code [A1-124/0 B1-2/3-3/B6-121], I get this error: AssertionError: for partial diffusion there can be no offset between the index of a residue in the input and the index of the residue in the output However, I can run the partial diffusion without any problem if I diffuse a random residue in chain A: [A1-123/1-1/0 B1-2/3-3/B6-121] I can also run this: [A1-124/0 B1-121]

Zasder3 commented 3 months ago

We ran into this issue as well when trying to do a similar diffusion process! There's a quirk in the PDB numbering scheme where RFDiffusion outputs get renumbered from the binder going 1-N and the underlying target going N+1-L. This means to get most pdb structures properly formatted you're going to want to attempt to renumber the outputs and potentially rename the chains (RFD names the binder A and B).

roccomoretti commented 1 month ago

To further clarify, RFdiffusion will reorder the chains. Chains with diffusable residue will be placed before chains without any diffusable residues. So when you provide the contigmap [A1-124/0 B1-2/3-3/B6-121], RFdiffusion will move chain B (which has a diffusable region) before chain A (which lacks it). For [A1-123/1-1/0 B1-2/3-3/B6-121] both have diffusable regions, so they're not reordered.

Normally this doesn't matter (except for the order in the output PDB), but for partial diffusion it needs to match up the residues in the contigmap (including the diffused residues) with the residues in the input PDB, and the reordering confuses it.

ibalafkir commented 1 month ago

Thanks to both of you for your help! :)

halasadi commented 1 day ago

I have the same error here.

I would like to partially diffuse certain parts of a protein while fixing others. I have a protein with chains A,B,C,D that is numbered in the PDB file from 0-410 sequentially.

For example, python RFdiffusion/scripts/run_inference.py inference.output_prefix="scratch/example_out" inference.input_pdb="scratch/protein.renumbered.pdb" contigmap.contigs='["A0-174/0 9-9/0 C184-270/14-14/C285-294/0 D295-384/17-17/D402-410"]' diffuser.partial_T=10 inference.num_designs=10

Here I'd like to fix residues 0-174 from chain A, 184-270 + 285-294 in Chain C, 295-384 + 402-410 in Chain D, while partially diffuse the other parts of the protein (as marked by 9-9, 14-14, and 17-17).

However, it gives me the same error as OP:

AssertionError: for partial diffusion there canbe no offset between the index of a residue in the input and the index of the                     residue in the output, 

So what would be the suggested solution here? Do I need to re-order the PDB somehow? I'm a bit confused here because I have multiple chains I'd like to partially diffuse. Or is this problem too complex for RFDiffusion?

Thank you in advance for your help!