continue-revolution / sd-webui-animatediff

AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI
Other
3.11k stars 258 forks source link

Advices on consistency for longer videos? #241

Open rkfg opened 1 year ago

rkfg commented 1 year ago

Expected behavior

WARNING! Many animated GIFs, ≈9 Mb each.

It's not an issue for 16-20 frames but anything longer often looks like it consists of two quite different parts. I enabled token padding as suggested but it doesn't seem to improve this situation much (maybe it's for a different issue, idk). The best consistency improvers are higher CFG (9+) and more steps (30) but for a longer video they're still not enough. Also a higher CFG (12-14) often introduces light flashes and unstable lighting in general.

Settings: 2023-10-22_00-40-29

I use the fine tuned human motion model that's based on mm_v15_v2. Same issues arise on the vanilla v15_v2.

Result for 20 frames: 00076-2551851382 Quite good and stable. Now 32 frames, overlap -1(i.e. 4): 00077-2551851382 Everything is morphing, including the character that sits differently in the first and second half of the video. Same 32 frames, overlap 6: 00078-2551851382 Slightly better, at least the background isn't as chaotic. The character is still not very stable. Same 32 frames, overlap 8: 00079-2551851382 Same 32 frames, overlap 10: 00080-2551851382 Getting somewhere, the morphing is still there but isn't as bad as in the beginning.

Anything else I'm missing? Is it possible to somehow enforce a better context preservation for longer videos, twice the context size and more? Or is it a fundamental limitation of the current tech? I'm currently not interested much in vid2vid, only in txt2vid. I know that guiding inference with a video should yield much better results.

yuu9703023 commented 1 year ago

這個是我目前生成的極限 動畫2aaAA2 guitar