Open recoilme opened 1 year ago
My experiment has dunamic strenght for each frame (by music amplitude) When i fix strength - it glitches I think we need more experiments. Ideally we must apply big weight to last frame and smaller to previous
For example current denoising strength 0.84
May be
last frame denoising strength 0.75 (last frame -1) denoising strength 0.5 (last frame -2) denoising strength 0.25
if you would like to contribute to the repo we can try merging your changes
Just wanted to mention that for this Loopback ControlNet to work, you need so drop Strength Schedule to 0:(0) or near 0, or it goes all wrong, in video2video. If your video looks like someone moving around behind the wallpaper, that's the reason. Here's even 0.15 SS with CN Loopback on:
And by the way, if anyone know how to get someone smoking, hit me up, because I could not do it, even with advanced prompt engineering and LoRA models of smoking.
-V
I think that the results in that paper could be replicated with Loopback, but even better would be a "Use previous image" option, which would let you use the last frame as the CNet image. Think of all the ways you could blend frames, if we could use the previous image as input... Loopback is a good preview of the possibilities, but for example, we could use Tile at 0.75, the same way we use Strength Schedule, I think. Maybe this is kinda extra, with loopback available, but it would be like having multiplicative different ways to control how your previous frame affects the next one.
https://anonymous-31415926.github.io/ - looked at this paper (RERENDER A VIDEO: ZERO-SHOT TEXT-GUIDED VIDEO-TO-VIDEO TRANSLATION)
The bottom line is that the input to each frame is the previous frame and a certain crossframe with textures, colors and shapes: The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors.
In short, I didn’t fucking understand, except that if you get confused, you can get an awesomely consistent video.
I set up a deforum, a bunch of controlnet models and started experimenting. First I came up with the following combination of models:
Generation speed dropped from 6 iterations/sec to 1 However, the authors of the paper write that they also need 16 video memory
But the generations were still inconsistent. Then I threw a reference on the very first frame:
And then a miracle happened. The girl stopped mutating like crazy
https://github.com/deforum-art/deforum-stable-diffusion/assets/417177/79428ba9-c9d0-4dcd-b3f7-402889ff3071
I decided to test the theory that in fact it is not the number of models, but the quality. You have to give two signals with a reference (slightly different) so that the model understands what frames unite. I threw out Oppose and Softage and Reference Atan and left only 2 controllers, both on the reference. The first takes the last frame. The second takes the very first frame. And the combination gave even more consistent results. And by the way, the speed has doubled to 2 iterations per second, because 2 models instead of five:
https://github.com/deforum-art/deforum-stable-diffusion/assets/417177/b3d16ab9-1e89-4318-a05c-1a30292c2fe5
Actually, I think it's even simpler than that. You need to give two frames, the last and penultimate, so that the model rolls animation over them. This will not have to adjust each frame by hand, and will smooth it out more properly, because the difference between the first and last frame increases with the distance covered by the animation. In general, you get such a ZERO-SHOT VIDEO-TO-VIDEO of shit and sticks, just as we like. Configure deforum for those who want to repeat it attached deforum_settings_cntrl (1).txt
And to understand the difference, the version without the controllet. When the model starts to dance (zero zero knocked out)
https://github.com/deforum-art/deforum-stable-diffusion/assets/417177/7489f9a3-4f44-4363-8d3e-ecc823687c6e
I haven't tested video2video by mask and so on, but I'm sure it should work. So, attention is all you need, just add two frames instead of one
https://dump.video/i/B1PLxztF.mp4