Open 311-code opened 6 months ago
Xformers removed and other improvements Comfy-SVDTools-modification-w-pytorch.zip
By downloading this you agree to help me improve it or bug fix haha, jk. If dev sees anything here useful feel free to implement of course!
Disclaimer, I have concerns this is not a direct 1:1 conversion of repo, I cant compare since can't get repo to work.
Comfyui doesn't use xformers by default anymore so it was a pain getting this repo to work. I tried doing the xformers with comfyui anyway, but the node still didn't work.
I've fixed it to work with default comfyui pytorch, and added some enhancements.
If you use this and improve it though please let me know, I can't really make fork rn as time constraints but really do want to see SVD improved further.
There are a lot of things in here that may not be exactly right btw, but it still greatly enhances SVD generations for me. If you put the augmentation at about 0.15, motion bucket ID at ~60, I get natural movements and facial expressions, and it's consistent at 48 frames ehrn using ehole batch latent image inputs and latent blend, or add. Or multiply nodes mixed with main svd latent.
More info: This is done doing a 48 frame whole batch latent input into ksampler, for a 48 frame SVD generation, and setting the timestep_embedding_frames on this node slightly lower. Also using perturbed guidance helps, (the advanced one) and settings it for blocks 0 - 5 on the output. Using Lower video linear guidance cfg of like 1 or 2 or even.. Changing the linear guidance sometimes up to 4 and adjust the timestep amount has big impact, experiment.
As I was saying, with this whole batch script try t do some latent blends, ot add, and a multiplier from the 48 batch images and mix with svd latent.
I personally like to use a second WAS power noise ksampler for refinement.
Anyways, here is also the node for 'whole_batch' option for WAS batch image loader Whole_batch_WAS_Node.zip (thanks to redditor!) It requires comfyuo commit 6c3fed70655b737dc9b59da1cadb3c373c08d8ed If you improve on any of this please let me know! Hope this helps someone.
Why not submit a pull request?
The attn windowing doesn't full work as it relies on xformers and not available yet in pytorch I read, but the code still makes svd a lot better when turning on the timestep embedding stuff and other features. So it's more experimental for testing at the moment.
Edited my original post. The whole batch WAS node load works with nightly build now so need need to go back to that old commit as in my previous comment. Just set it to just a latent blend
nodes and blend the svd_img2vid_conditioning and the whole_batch was batch image loader latents of the 48 images (AI generated or real images) at 0.5-0.9 stength (like like to make it go more toward the svd latent so 0.9 stregth). Then go in to the latent input on ksampler. Make sure to use this node on the "model and latent" also before it hits ksampler.
The attn windowing doesn't full work as it relies on xformers and not available yet in pytorch I read, but the code still makes svd a lot better when turning on the timestep embedding stuff and other features. So it's more experimental for testing at the moment.
Edited my original post. The whole batch WAS node load works with nightly build now so need need to go back to that old commit as in my previous comment. Just set it to just a
latent blend
nodes and blend the svd_img2vid_conditioning and the whole_batch was batch image loader latents of the 48 images (AI generated or real images) at 0.5-0.9 stength (like like to make it go more toward the svd latent so 0.9 stregth). Then go in to the latent input on ksampler. Make sure to use this node on the "model and latent" also before it hits ksampler.
Fortunately, the flavor of ComfyUI I'm working on does have xformers by default, so I'm good there.
I'm working on creating a new "SVD Advanced" suite of nodes that incorporates this and some other stuff from around the webz into one place so we don't have to spaghettify everything...do you mind if I include this?
do you mind if I include this?
Sorry for delay, thanks for asking! Yeah go right ahead. It did take me about 3 weeks though lol, so maybe you could just put a small comment somewhere #Thanks to BuckJohnston for a few parts that are clearly original and not just rewrites of what this this repo does, like maybe some of the def patched_model and patched_forward section? Or if you change it completely and any ideas to credit for ;) Honestly though if not it's fine, I just want this to get better.
I think a lot can be adjusted and improved here. I also have some preliminary story diffusion code I recreated from the paper for consistent self attention if you are interested. Any chance you could add me on Discord, it's buckjohnston. Also really want to check your progress and assist any way I can so keep me updated on your SVD Advanced Suite.
Xformers removed and a few misc. improvements. This greatly enhances SVD video for me: Comfy-SVDTools-modification-w-pytorch2.zip Note, this is experimental and attn windowing relies on xformers. This is for if you don't want to convert your comfyui to xformers but still get a few benefits for SVD consistency.
Comfyui doesn't use xformers by default anymore so it was a pain getting this repo to work. I tried doing
python_embeded\python.exe -m pip install xformers --no-deps
, I am on nightly build at the moment and still can't build the xformers whl with cmake for 3.12 and cuda 12.3. I may revert at some point.I've modifed this to work to some extent with default comfyui pytorch, and added some enhancements.
There are a lot of things in here that may not be exactly right btw, but it still greatly enhances SVD generations for me. If you put the augmentation at about 0.15, motion bucket ID at ~60 or even as high as 190.