bjing2016 / alphaflow

AlphaFold Meets Flow Matching for Generating Protein Ensembles
MIT License
363 stars 48 forks source link

Diffusion scheduling code making abnormal protein output #29

Open jmoojun opened 4 months ago

jmoojun commented 4 months ago

I believe your code has some discrepancies when compared to the pseudocode in your article.


Algorithm 1 TRAINING Input: Training examples of structures, sequences, and MSAs {(Si,Ai,Mi)} for all (Si,Ai,Mi) do Extract x1 ← BetaCarbons(Si) Sample x0 ∼ HarmonicPrior(length(Ai)) Align x0 ← RMSDAlign(x0, x1) Sample t ∼ Uniform[0, 1] Interpolate xt ← t · x1 + (1 − t) · x0 Predict ˆ Si ← AlphaFold(Ai,Mi, xt, t) Optimize loss L = FAPE2( ˆ Si, Si)

Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?


for t, s in zip(schedule[:-1], schedule[1:]): output = self.teacher(batch, prev_outputs=prev_outputs) pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None) noisy = rmsdalign(pseudo_beta, noisy) noisy = (s / t) noisy + (1 - s / t) pseudo_beta

This holds the same in ModelWrapper.inference.

The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.

image
bjing2016 commented 2 months ago

Which output are you showing here? In the code, the time index is flipped --- so t=1 in the paper corresponds to t=0 in the code, and vice versa. Sorry that this is not documented more clearly.