Suggestions to improve grad videos

Kahsolt / stable-diffusion-webui-prompt-travel

Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui.

The Unlicense

252 stars 22 forks source link

Suggestions to improve grad videos #2

Closed seihoukei closed 1 year ago

seihoukei commented 1 year ago

I've experimented quite a bit with the extension and ended up in video/gif editor every time, so decided to list some things that seemed annoying yet could probably be easily remedied.

These improvements seem to be trivial:

Option to only output every other frame into video (solves pingpong issue, can technically be done by using even step count)
Option to change interrupt behavior so that last two frames (the unfinished one and final frame, probaby not reached) are not added.
Option to start and finish video at some percentage (20% to 80% tend to be most meaningful for 2-point travel)

And these might be a lot to ask:

Option to output a GIF (pingpong or repeating, with optionally extended durtion of first/last frames)
Option to drop frames that have high similarity (with any reasonable kind of threshold)

Kahsolt commented 1 year ago

Thanks for your attention and kind suggestions~ I would try to fix 1~4 by adding new video export options. As for 5, I will re-enable subseed, you can try that maybe. But still I would highly encourage you to use higher FPS as a work around, or pick only even steps like in suggestion 1.

PS: I guess that you might be mainly annoyed by the tailing broken or un-consecutive images in the video end? :lol Because currently this repo is more like an experimental dice-rolling playground than a product-level applet. In this sense, the exported video is exactly a full expriment recording to me, allowing us to check the sampler performance (non-converging ping-pong phenomenon, the converging direction and speed, etc.), hence I did not kick out the bad ones.

seihoukei commented 1 year ago

I understand that this is an expeiment, but it's very interesting one, absolutely worth exploring more if possible. The conceptual morphing coming out as a result is superior to classic geometric interpolations in many ways. Sadly I don't have a good grasp on what's happening in latent space so I can't give ideas on improving pathfinding, but even in current state, grad is interesting.

Option 5 was more about equalizing velocity of a video that would otherwise get slow down on some state that's spanning wider and takes longer to pass through, which contrasts with rapid changes happening in other frames.

Kahsolt commented 1 year ago

Option 1~4 should be available now in last commit. 😃 I tried a bit on Option 5, but find that the reasonable threshold varies from prompt to prompt, it's not likely to be auto determined without interactive tuning. Currently this script is stateless, and the file exporting could not be separate from the image generating procedure. so I just give up attempts for the moment (

Though the grad mode seems working, it's behaviour is somewhat out of my expectation 🤣 Hence I'm continuing to hack into the samplers. The gradients should be recomputed at each sampling step (current implementation only approximate gradients of the final step), that sounds should be more accurate in my sense. I will experiment this it in another repo and maybe backport it later.