basujindal / stable-diffusion

Optimized Stable Diffusion modified to run on lower GPU VRAM
Other
3.14k stars 468 forks source link

Increasing image size generates worse images #152

Open Peca21 opened 1 year ago

Peca21 commented 1 year ago

I wanted to generate a bigger image but every time I increase the image size, result gets worse. https://imgur.com/a/DLEiSGb My GPU is RTX 3080 10GB

bitRAKE commented 1 year ago

This is a result of the model being trained on 512x512 images. Outside of that, the model hallucinates - kind of like a holographic memory seeing similar projections - and they all composite together. It is possible someone will develop another model to combine these projections, but that remains to be seen.

The projections seem fewer if we reduce the modalities of the space. This means using prime multiples of 64 for the dimensions. It is only partially successful, in my experience.

Another technique is to use img2img and masking to generate the image in parts. This is kind of how the GoBig fork does it's magic: https://github.com/jquesnelle/txt2imghd

It would be nice to hear of successful techniques of others.

konimaki2022 commented 1 year ago

After much experimentation I could say that it is about the patterns and our interpretation of them: in some cases I have achieved successful compositions at high resolutions, in others a resounding failure.

It is much easier for us to recognize strange patterns in: faces, bodies, animals, texts, than in others such as: forests, cities with buildings, landscapes, objects, monsters...

The model has more chances of success generating with the second group than with the first, since in essence it is to fill the space with what has been learned. The patterns learned from an image of a forest at 512x512 are much easier to enlarge to 1024x1024 based on repeating them, than those of a face or animal, that if repeated, we will obviously detect something weird.

It's all about probability and sometimes it's possible to generate large images without such strange results. Take a look: https://imgur.com/a/K0mzou5

bitRAKE commented 1 year ago

I have an album on facebook where I gather size anomalies: https://www.facebook.com/media/set/?set=a.10159796081345272

Some are as you said atmospheric type renders, but others are just amazing prompts. For example, the viking women have a high frequency to produce decent renders at larger sizes.

realistic photo of a majestic viking woman, unbothered, battle-scarred mind-blowing details, hyperrealism, highly detailed face, ethereal, sadness, luxury, ominous, scarred, highly detailed, viking attire, cinematic, 16k, 1080s, smooth, sharp focus, by stanley artgermm, wlop, trending on deviantart, trending on artstation, digital art, smooth gradients, depth of field, shot on canon camera

This one is kind of nice - always producing a framed in fantasy atmosphere:

a full body high detail fantasy portrait oil painting illustration of a single beautiful bard woman by justin sweet and artgerm with face and body clearly visible, in a scenic background, pretty eyes, realistic proportions, d & d, rpg, forgotten realms, artstation trending, high quality, sombre mood, artstation trending, muted colours, entire person visible!

I've sorted through 1000's of images and some prompts just seem to resonate with the model, I've gotten lucky seeds, or personal bias? When I do a bulk run over night the failures all seem to double/triple/.. over in the same place. Which is very prevalent when there is a solitary subject.

My curiosity is just wanting to better understand what the model is doing outside it's training expectation.

bitRAKE commented 1 year ago

The images reminded me of this ShaderToy: https://www.shadertoy.com/view/ldlSzX

So, I'm wondering if the another model can target the interference pattern to produce better image at arbitrary scale. Or maybe incorporate interference into the existing diffusion model to remove resolution dependence (replacing the latent space).

Tetsujinfr commented 1 year ago

got same "issues" with higher res images. this repo fork is quite amazing but it turns out especially useful to keep ram usage lower for 512x512 image generation

seed_35_00018

seed_35_00028