CompVis / depth-fm

DepthFM: Fast Monocular Depth Estimation with Flow Matching
MIT License
380 stars 26 forks source link

why is NFE=1 for marigold pure noise #3

Open w-hc opened 6 months ago

w-hc commented 6 months ago

Hi thanks for the inspiring work. In fig 6, for marigold NFE=1, the result is pure noise. That seems counter-intuitive. At NFE=1, we should just get the conditional mean of the prediction i.e. x0 hat. It may be blurry, but it's hard to see why it should be pure noise.

Fannovel16 commented 6 months ago

I think NFE=1 is just another way of saying "1 denoising step"

mgui7 commented 6 months ago

Hi w-hc, as Fannovel16 pointed out NFE=1 means predicting the depth within one single step. Marigold uses DDIM sampler that approximates the diffusion SDE with an ODE, and fewer inference steps results in increased ODE approximation error. This basically always leads to generation of noises or images that are noised. Please refer to more details in DDIM and DPM-Solver.

jiahaoli95 commented 6 months ago

I believe diffuser's DDIM solver is not intended to be used with NFE=1. In that case the diffuser implementation uses timestep t=1 and the model will basically do nothing to the image. But I think the correct way to do it is using t=999 for one step denoising.

w-hc commented 6 months ago

second Jiahao. The NFE=1 result for marigold should be much better.