comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
42.41k stars 4.48k forks source link

Feature Request: Implementation of DMD (corrected title) #3167

Open rodneyviana opened 3 months ago

rodneyviana commented 3 months ago

After the MIT paper of 1 step generation using DMD technique, I was wondering if there is any plans of implementing the new technique. For reference: MDM MIT

NeedsMoar commented 3 months ago

You should probably fix the title and link, there's no model from MIT called MDM. Right now it reads like DMD has been implemented somewhere and MDM is new ("After the MIT paper... the new technique"), although one exists here for generating natural human motion): https://github.com/GuyTevet/motion-diffusion-model

That article reads like it was partly AI generated and isn't very useful, the MIT paper is here https://arxiv.org/abs/2311.18828 and was released last december... unfortunately training it is extremely expensive compared to the faster SDXS (SDXS can be trained on a single consumer GPU in a fairly small number of iterations apparently, DMD took 36 hours on 72 A100s with an SD 1.5 base model).

The only thing that comes up in a search is this (and a few others by the same user) with no explanation on huggingface: Dreamshaper-8-dmd-48k... note that this isn't the repo of the people who are responsible for Dreamshaper.

One of the models mentions "kl-only" in the title and the paper specifically includes a kl-loss calculation algorithm so it's likely that one omits part of the paper as a test.

Anyway I downloaded that UNET, loaded it in comfy via the unetloader (checkpoint loader will fail to identify the model type) with an original SD 1.5 checkpoint in the regular loader for the CLIP since none of the other ones I had laying around worked, and played with things until it worked. It didn't require any manual model type nodes like the v prediction models and as advertised generates in 1 step. Sampler and scheduler had little to no effect except that some produce corrupt images, mainly the SDE models. Just use LCM / Normal and you'll be fine.

A 512x512x64 image batch on the 4090 executed in 4.25 seconds with a full size VAE, so 0.066s, and 3.09s with TAESD. This is just over the 20fps the paper claims so I'm guessing it is indeed one of the DMD models. Not as fast as SDXS with a regular VAE on my machine but the SDXS pre-release model is awful at 1-step generation and we'll be waiting for a working one for a while.

They didn't mention anything in the paper about weird sampling methods, although there's a denoising formula. Here's the workflow.
workflow(1)

NeedsMoar commented 3 months ago

One side effect of their difference based training is that CFG has to be at 1. Other values just produce double exposures with additional images as they increase. I couldn't run more than 2 steps without the image decaying into a noisy faded out state, and the second step doesn't really do much of value.

NeedsMoar commented 3 months ago

Prompts all converge towards "Young asian woman wearing black jacket and jeans with no shirt on under it and nipples censored" as CFG approaches zero, btw. It also seems like for any given prompt you'll just get hundreds of very slight variations on the same image rather than the high variance a normal model would show. Same for negative prompts with no positive. Setting to 0 and using "topless asian girl" as the prompt produces a flesh tube with hair at the top and some horror of a face.

I could see it being useful with some form of video2video since the results are so consistent for given conditioning but for general image generation it isn't useful unless you want tons of small variations of an image.

rodneyviana commented 3 months ago

I found this https://github.com/Zeqiang-Lai/OpenDMD Tried to make it work, but I couldn’t. I downloaded the project and the alleged models but only got noise. Author does not offer instructions. The paper does not point to any repro. Nobody is talking about this as well.

Get Outlook for iOShttps://aka.ms/o0ukef


From: NeedsMoar @.> Sent: Sunday, April 7, 2024 5:54:07 PM To: comfyanonymous/ComfyUI @.> Cc: Author @.***> Subject: Re: [comfyanonymous/ComfyUI] Feature Request: Implementation of MDM (Issue #3167)

You should probably fix the title and link, there's no model from MIT called MDM. Right now it reads like DMD has been implemented somewhere and MDM is new ("After the MIT paper... the new technique"), although one exists here for generating natural human motion): https://github.com/GuyTevet/motion-diffusion-model

That article reads like it was partly AI generated and isn't very useful, the MIT paper is here https://arxiv.org/abs/2311.18828 and was released last december... unfortunately training it is extremely expensive compared to the faster SDXS (SDXS can be trained on a single consumer GPU in a fairly small number of iterations apparently, DMD took 36 hours on 72 A100s with an SD 1.5 base model).

The only thing that comes up in a search is this (and a few others by the same user) with no explanation on huggingface: Dreamshaper-8-dmd-48khttps://huggingface.co/aaronb/dreamshaper-8-dmd-48k... note that this isn't the repo of the people who are responsible for Dreamshaper.

One of the models mentions "kl-only" in the title and the paper specifically includes a kl-loss calculation algorithm so it's likely that one omits part of the paper as a test.

Anyway I downloaded that UNET, loaded it in comfy via the unetloader (checkpoint loader will fail to identify the model type) with an original SD 1.5 checkpoint in the regular loader for the CLIP since none of the other ones I had laying around worked, and played with things until it worked. It didn't require any manual model type nodes like the v prediction models and as advertised generates in 1 step. Sampler and scheduler had little to no effect except that some produce corrupt images, mainly the SDE models. Just use LCM / Normal and you'll be fine.

A 512x512x64 image batch on the 4090 executed in 4.25 seconds with a full size VAE, so 0.066s, and 3.09s with TAESD. This is just over the 20fps the paper claims so I'm guessing it is indeed one of the DMD models. Not as fast as SDXS with a regular VAE on my machine but the SDXS pre-release model is awful at 1-step generation and we'll be waiting for a working one for a while.

They didn't mention anything in the paper about weird sampling methods, although there's a denoising formula. Here's the workflow. workflow.1.png (view on web)https://github.com/comfyanonymous/ComfyUI/assets/119545088/464efe3c-e35c-4ceb-8bae-a147a255be81

— Reply to this email directly, view it on GitHubhttps://github.com/comfyanonymous/ComfyUI/issues/3167#issuecomment-2041634949 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGT5MARUFL6FZI5ILBSLEFDY4HFBBBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE2TQOJYGMYTOMJYQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYTINJWGQ4TAN5HORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

IDKiro commented 3 months ago

You should probably fix the title and link, there's no model from MIT called MDM. Right now it reads like DMD has been implemented somewhere and MDM is new ("After the MIT paper... the new technique"), although one exists here for generating natural human motion):

https://github.com/GuyTevet/motion-diffusion-model

That article reads like it was partly AI generated and isn't very useful, the MIT paper is here https://arxiv.org/abs/2311.18828 and was released last december... unfortunately training it is extremely expensive compared to the faster SDXS (SDXS can be trained on a single consumer GPU in a fairly small number of iterations apparently, DMD took 36 hours on 72 A100s with an SD 1.5 base model).

The only thing that comes up in a search is this (and a few others by the same user) with no explanation on huggingface:

Dreamshaper-8-dmd-48k... note that this isn't the repo of the people who are responsible for Dreamshaper.

One of the models mentions "kl-only" in the title and the paper specifically includes a kl-loss calculation algorithm so it's likely that one omits part of the paper as a test.

Anyway I downloaded that UNET, loaded it in comfy via the unetloader (checkpoint loader will fail to identify the model type) with an original SD 1.5 checkpoint in the regular loader for the CLIP since none of the other ones I had laying around worked, and played with things until it worked. It didn't require any manual model type nodes like the v prediction models and as advertised generates in 1 step. Sampler and scheduler had little to no effect except that some produce corrupt images, mainly the SDE models. Just use LCM / Normal and you'll be fine.

A 512x512x64 image batch on the 4090 executed in 4.25 seconds with a full size VAE, so 0.066s, and 3.09s with TAESD. This is just over the 20fps the paper claims so I'm guessing it is indeed one of the DMD models. Not as fast as SDXS with a regular VAE on my machine but the SDXS pre-release model is awful at 1-step generation and we'll be waiting for a working one for a while.

They didn't mention anything in the paper about weird sampling methods, although there's a denoising formula. Here's the workflow.

workflow(1)

I upload a new version of SDXS-512 for community. I am glad if you can try it.

https://huggingface.co/IDKiro/sdxs-512-dreamshaper