Open patricia-rocha opened 2 months ago
I'm unsure how well RFdiffusion works with insertion codes. You can try explicitly including things in the contig (something like [B1-52/B52A-52A/B53-94/14-20/B103-111]
), but I haven't tried that at all. You may be better off just renumbering the input file to remove the insertion codes(*). RFdiffusion will renumber/relabel the output structure anyway.
*) Automated methods exist. I'd personally use Rosetta to do it, because that's what I'm familiar with. Other programs will also do it, though.
Hi @roccomoretti
After further investigation, I identified the cause of the behavior I described. When RFdiffusion parses the PDB files using the parse_pdb_lines
function, it only considers the sequence numbers, excluding any insertion codes. As a result, in my example, (Gly, 52), (Ser, 52A), (Ser, 53) becomes (Gly, 52), (Ser, 52), (Ser, 53). Then, duplicate sequence numbers are removed, leading to the final sequence being (Gly, 52), (Ser, 53), with the insertion (Ser, 52A) discarded.
To address this, I have opted to renumber my files without insertion codes.
While performing motif scaffolding with RFdifussion, I have noticed that the fixed segments in the generated PDB files do not exactly match those in the reference PDB file. For example, I'm using the following contig [B1-94/14-20/B103-111] and I have a PDB file in which the sequence includes Gly, Ser, Ser at positions 52, 52A, 53. However, in the generated PDB files, only Gly and Ser are present at positions 52 and 53, with the insertion at 52A being discarded.
Questions: