brycedrennan / imaginAIry

Pythonic AI generation of images and videos
MIT License
7.92k stars 436 forks source link

adaptive sampler types not respecting previous step anymore #145

Closed yourbartender closed 1 year ago

yourbartender commented 1 year ago

I've created some tests with different sampler types (e.g. k_euler_a) to create some sort of animations by using the same seed, prompt, strength etc, only increasing step numbers (I haven't found a good way to do this but by creating a text file with individual prompts on a line). However, I noticed in 7.2.0 this no longer works, it seems the seed is not respected at all, or there is no similarity whatsoever between steps next to each other. I wonder if this a new feature or a bug…

the prompt for both of these is pretty similar:

imagine --seed 6947113 --sampler-type k_euler_a -w 1024 -h 512 --steps 7 --outdir LandscapeAnim --prompt-strength 11.01 "Lapland forest in winter, abandoned dystopic technology, futuristic technology. ruins, nature taking over buildings, unreal engine 5, photorealism, cinema 4d, 3D, cinematic, professional photography, rendered by Beeple, WLOP, trending on artstation, 4K UHD image, surreal, detailed, intricate" (screenshot 1) and

imagine --seed 588566485 -s 8 --model ../mdjrny-v4.ckpt --tile-x --negative-prompt "" --prompt-strength 18.0 -y k_euler_a -w 768 -h 384 -u "mdjrny-v4 style 360 degree equirectangular panorama. A dystopian future winter forest, sci-fi ruins, digital illustration, futuristic, weird alien architecture, distroyed spaceships, nature taking over, highly detailed, unreal engine, artstation, surreal, golden hour" (screenshot 2)

I used a custom model in the second one, however I also tried with the normal SD1.5, removed negative prompts by declaring them "" (a horrible feature imho by the way), without the equirectangular panorama settings, tried other ancestoral models, tried shortening the prompt. All in all the similiraty between the steps is totally gone.

Screenshot 01 Screenshot 02

brycedrennan commented 1 year ago

with k_euler_a specifically i also have seen this inconsistency. all the other samplers are fine though I believe. does using a different sampler work well enough?

Also... midjourney 4 is just some model weights I can use? I had no idea.

As for your animations, you might consider adding --show-work to one of your generations and looking at the output in the steps folder

brycedrennan commented 1 year ago

I have not yet identified root cause of this issue.

yourbartender commented 1 year ago

At least k dpm 2a produced same result. some similarity, but nothing as smooth as it "used to be". Didn't try k_dpmpp_2s_a yet.

There is a midjourney v4 model for stable diffusion downloadable in hugginface. But perhaps textual inversion embeddings will soon make this unnecessary…

the --show work seems to put out only one image generation from scratch noise to the final fully rendered out image, this is not the same result as e.g. between steps 10-20 ( I wished it would do this).

brycedrennan commented 1 year ago

show work should output the image at each step. is that not what you're seeing?

yourbartender commented 1 year ago

It creates what I call "image creation" — from noise to the final result. Whereas it used to work that e.g in k_euler_a already in around step 10 or so one already has a decent looking image, and in steps 11, 12 13 etc change that image a little, eventually creating a different landscape. the show-work doesn't really do that change but rather shows an image creation process with only slight changes. Come to think of it, I am not sure if I tried this after the show-work was fixed (it was a bug fixed at some point, right? So perhaps there was something which happened at that point.

brycedrennan commented 1 year ago

imagine "a bowl of fruit" --show-work

https://user-images.githubusercontent.com/1217531/208845740-d8afdf73-f6c1-4375-ad53-b16c4fbc1f7d.mov

yourbartender commented 1 year ago

I'll make two videos which illustrate the issue if that helps. I know SD isn't currently really designed to put out videos or animations, but hey, I am experimental media artist and want to try out new things :) so essentially my target is to create a "animated / morphed" 360 video, which is now possible due to the amazing equirectangular feature.

brycedrennan commented 1 year ago

There are different kinds of images output. the ones you'd most likely want are the "predicted_latent" ones

brycedrennan commented 1 year ago

if you're up for a quick chat I'm trying something ambitious with animation right now. you might have some ideas on how to improve it

yourbartender commented 1 year ago

this video has film smoothing between frames, but one can clearly see the "ancestoral" value here, all the new changes are based on an earlier rendered image.

https://user-images.githubusercontent.com/15839644/208846012-0ee0e0fb-c148-43d5-9cd6-a17c55f59d8b.mp4

brycedrennan commented 1 year ago

ah so this is more of an img2img thing?

brycedrennan commented 1 year ago

if you're up for a chat. totally fine if you're not

yourbartender commented 1 year ago

Something I came up with now: could it be the prompt length for some reason? I notice that if the prompt is too long, it doesn't get written to the exif metadata completely. I shortened it and seem to get more consistent "morphing" style changes between frames. So perhaps the whole prompt is not respected. "A dystopian winter forest, sci-fi ruins, futuristic, weird alien architecture, highly detailed, artstation, golden hour" is better than "mdjrny-v4 style 360 degree equirectangular panorama. A dystopian future winter forest, sci-fi ruins, digital illustration, futuristic, weird alien architecture, distroyed spaceships, nature taking over, highly detailed, unreal engine, artstation, surreal, golden hour"

yourbartender commented 1 year ago

https://user-images.githubusercontent.com/15839644/208872355-729da651-bb98-41d8-8e6c-7d82e5024450.mov

Ok, I promised to do a comparison video, so here is one, slowed down. There are a few more frames in the --show-work version on the left, but one can clearly see that it is not saving the actual steps. There are 35 steps in the left video, I think the one with iterating step number in the prompt ends in 31 (and as I said a few steps missing). I noticed I used a non-ancestoral sampler type in this example, so what would happen on the right would be some minor changes, and at some point too many steps would just break the image. In my earlier tests the ancestoral step saving behaved more nicely, with some changes happening in the image, but not so jumpy between steps as the current results seem to be.

yourbartender commented 1 year ago

Also one more thing to consider / for trying out with me is the resolution: 512 by 512 or some weird values might produce different results. 512 x 1024 might work since it is essentially double image size? Would be handy for the equirectangular stuff. Currently have been testing with w 768 -h 384, and I notice my earlier tests were mostly 512x1024 (although 704 x 384 did seem to produce the effect I am looking for with a totally different prompt). Perhaps it is time to rethink and try to focus on the new spatial image possibilities…

brycedrennan commented 1 year ago

I didn't do a good job explaining when we talked but the version on the left is definitely showing the image generation process at each step. The video on the right is (probably) showing what the image would look like if it was generated with a different amount of steps. This sounds like the same thing but it's different. (i say probably since I haven't seen your script)

Steps are more like subdividing the work that needs to be done. If an image was generated with 100 steps then each image did 1% of the work and 100% of the work got done. If an image was generated with 10 steps, then each step does 10% of the work.

Said another way. Image A I tell it to generate in 20 steps. Image B I tell it to generate in 100 steps but I stop it at step 50. Which image will look better? Image A will look way better because it completed 100% of the work but image B only complete 50% of the generation process. So even though image B took 50 steps, it will look unfinished compared to the 20 step image.

brycedrennan commented 1 year ago

Fixed in 7.3.0

yourbartender commented 1 year ago

Ok, I know this is closed, but just wanted to share this experiment: with 7.3.0 I made an experimental "animated" 360 landscape utilizing the ancestral behaviour of k_euler_a. The original was 1024*512 video, and YouTube really crunched that video to rubbish… Now I am thinking is there a way to choose the scale of the randomness / ancestral behaviour — now different steps produce only a small difference, in earlier versions the difference was too huge… So is it either on or off, or could it be adjusted (eg. as a float) between 0.0 and 1.0?

brycedrennan commented 1 year ago

Cool video. how'd you make it 360?

I'm not clear that you have the right mental model. What you describe sounds more like generating an image and then passing it into the next generation via --init-image and controlling it's influence via --init-image-strength 0.2 (number between 0 and 1)

yourbartender commented 1 year ago

equirectangular images with ancestoral sampler and a slow fade inbetween -> basically a slideshow rendered to a videofile. Youtube has a tutorial how to upload a video as 360, basically one "injects" 360 metadata to the videofile (there is an app / python script for that). Now in the process of upscaling the original images to try to produce a better resolution version.

2nd part: I was just wondering how this behaviour of k_euler_a was changed, I found a threshold value in the code, I guess it is a value which marks that a "step" is complete (one step is thousands of operations, right?), and if by changing this value or some other there would be a feature in the ancestral model of how much it takes in consideration the previous step. Kind of like init-image value would behave, which I understand is what the _a models should be doing somehow. But talking about init-image, it would be worth trying out this method — or the depth model which just came out, but let's see.

yourbartender commented 1 year ago

Here is info how to upload 360 videos