G-U-N / AnimateLCM

[SIGGRAPH ASIA 2024 TCS] AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
https://animatelcm.github.io
MIT License
612 stars 46 forks source link

Discuss for sigma and timesteps implementations #25

Closed jiangzhengkai closed 6 months ago

jiangzhengkai commented 6 months ago

@G-U-N Hi,thanks for your great work.

I have finetune svd using the following way generating sigma and timesteps, which is consistent with EDMv2 paper

My implementation:

企业微信截图_17157433189090

EMDv2:

企业微信截图_1715743379560

You seems follow the original EDM paper implementation. Have you compare with EDMv2 way?

G-U-N commented 6 months ago

Hi @jiangzhengkai, very glad for your interest. I am not sure about the difference of your mentioned EDM and EDMv2. But looks like your implementation is exactly how the SVD was tuned? I also used that format. Could you elaborate more on that?

jiangzhengkai commented 6 months ago

@G-U-N Yes, I follow a public repo SVD_Xtend. After read the EDM paper, I think your implementation is more close to origin paper. Have you compare the two implementation choices? Which one is better?

图片
G-U-N commented 6 months ago

Hey @jiangzhengkai ,

I see. It's about the timestep/sigma sampling in training.

I hope to clarify that the original EDM is exactly trained as your implementation instead of mine. The red square is about sigma sampling for generation. For training, the sigma/timestep sampling is listed in the third row of the table, which is the same as your implementation.

Go back to my implementation, the reason I use the discretization sigma/timestep sampling instead of the continuous sampling as the original implementation is to facilitate the consistency training process. Only in this way, the teacher diffusion model will always induce the same empirical ODE trajectory due to the fixed discretized sigma sampling.

Hope this clarifies any confusion.

jiangzhengkai commented 6 months ago

@G-U-N Thanks for your explain.