atong01 / conditional-flow-matching

TorchCFM: a Conditional Flow Matching library
https://arxiv.org/abs/2302.00482
MIT License
1.04k stars 78 forks source link

Inpainting with CFM #123

Open lukasschmit opened 1 month ago

lukasschmit commented 1 month ago

Been having great success using cfm over diffusion methods for audio tasks so far, kudos for the great library!

One thing I'm having trouble wrapping my head around is the most correct way to formulate the inpainting task.

with denoising diffusion the repaint method is extremely intuitive and works well in practice. but i think its more complicated for flow?

atong01 commented 1 month ago

Cool! I have not experimented with this. I'm curious if you've tried the same strategy for flow matching? My feeling is the same trick may work.

lukasschmit commented 1 month ago

@atong01 I think it should, but with the caveat that you might have to integrate the clean target through vector field (network) up to the current noisy timestep. Just using sample_xt like we do for training did not work.

I think there is another possible approach—use the mask/clean target to zero out the vector field (network output) i.e. indicating that the unmasked regions have no derivative/don't change at any timestep. and then at every single network forward pass, we force the input to be the clean target with the mask. but with this approach the network input would be noisy in some regions and clean in others which is a training/inference mismatch if the network were not trained with only some regions being corrupted.

dapaoA commented 3 weeks ago

@lukasschmit

I think the training-free inpainting method would work for both SGM and FM, the sampling process has no difference. Would you like to tell more? Repaint is no longer the best method for training-free inapinting method, you could check on this flow-based repaint method: https://arxiv.org/pdf/2310.04432