OmicsML / scDiff

scDiff: A General Single-Cell Analysis Framework via Conditional Diffusion Generative Models
MIT License
22 stars 4 forks source link

The big issue of denoising code! #3

Closed EddieBio closed 8 months ago

EddieBio commented 8 months ago

Hi,

I found that the x update does not rely on the model itself during denoising except for the last time step. In the function p_sample_loop, the update of x is covered by the function self.prepare_noised_input, which is shown below:

image

Could you please explain it since it seriously affects the results of the paper?

Thank you very much!

EddieBio commented 8 months ago

More than that, the code does not corrupt the single-cell data. Instead, it uses the function self.prepare_noised_input to add a little noise to the whole single-cell data then use the model to calculate x_0. The denoising mask is only applied to the x_start as the contextual information. In other words, the author does not corrupt the data and just uses the model to recover x_0 from x_1, thus the performance of denoising is very high. image

RemyLau commented 8 months ago

Hi @EddieBio, thank you so much for pointing this out! You are indeed right about the mistake on the inference steps due to incorrectly "resetting" for the masked entries (inpainting setting). I have just fixed this in this commit (5de418c) by correctly resetting the masked entries.

Since the issue was on the inference side, the model training was unaffected. After rerunning the experiments using the fixed code, we've achieved very similar results as before (see table below). This is also suggests that the diffusion model was trained optimally so that its full inpainting performance (generate from t=1,000) is very close, if not identical, to one-step inpainting (generate from t=1).

Thank you once again for catching this mistake! Please feel free to close the issue if you feel we have addressed your concern sufficiently.

Dataset scDiff (new) scDiff (old) Diff (new - old)
Jurkat 0.8223 ± 0.00020 0.82259 ± 0.00012 -0.0003
293T 0.8102 ± 0.00008 0.81108 ± 0.00012 -0.0009
PBMC1K 0.7744 ± 0.00010 0.77497 ± 0.00036 -0.0006
EddieBio commented 8 months ago

Thank you for your reply. However, the x is still updated by the function self.prepare_noised_input and q_sample rather than the reversed diffusion steps (p_sample) except for the last time step.

RemyLau commented 8 months ago

@EddieBio didn't I replace the prepare_noised_input function call already? I'm a bit confused about your last comment. Could you please elaborate?

EddieBio commented 8 months ago

@EddieBio didn't I replace the prepare_noised_input function call already? I'm a bit confused about your last comment. Could you please elaborate?

I see. It really solves the issue! Thank you very much!