aqlaboratory / genie2

Protein structure diffusion model for unconditional protein generation and motif scaffolding
Apache License 2.0
73 stars 9 forks source link

RFDiffusion scaffold benchmark #4

Open JoshuaMeyers opened 2 weeks ago

JoshuaMeyers commented 2 weeks ago

Hey Guys, Cool work its a really nice read. Just a quick comment that I believe the motif scaffolding benchmark as you have implemented does not match the RFDiffusion motifs exactly. The three cases which have small, medium and large scaffolds (6exz, 6e6r, 5trv) are minorly different in the RFDiffusion outputs when compared with your PDB file specifications. This is likely because RFDiffusion seem to index these files differently... I thought I'd flag it in any case.

For example if you download the output sequences from RFDiffusion paper (https://figshare.com/s/439fdd59488215753bc3), and look at the file Motif_scaffolding_benchmark.fasta. You will see that for 6exz the fixed motif is HLE...FMLA while in your file it is LET...MLAE (i.e. offset by one). This is similarly true of the other two cases I mention.

Let me know if you disagree, we hit this minor gotcha a few weeks ago.

yeqinglin commented 2 weeks ago

Hi @JoshuaMeyers Thank you very much for pointing this out. We follow the specification from the RFDiffusion paper for this motif scaffolding benchmark, and the fixed motif might be minorly different because of reindexing. We would double check with our motif files again. However, we believe that this should not affect our performance comparisons since we evaluate both RFDiffusion and Genie 2 using the same set of motif files.