Closed orcinus closed 8 months ago
I’ve run into similar problems myself before, but it’s kind of difficult to reproduce. Do you have some example feedback for which this reliably happens? Or some generation parameters to reproduce the behavior?
Unfortunately, no, that's part of the problem - it appears very randomly. I'll try and create some kind of a repro case.
I strongly suspect it happens when the generated images are increasingly more similar to the feedback images. Essentially a feedback loop that leads to unet neuron death.
Yes, on one hand that sounds plausible, but on the other hand I’ve tested the plugin by replicating one image ~10 times without the model burning out. I suspect it has something to do with specific features in the image. For example dark patches with one solid color could be disproportionally influential on the generated image.
Yeah, i had that thought too, but sometimes it's a dark or contrasty image that causes this, and sometimes a very non-contrasty flat one, without histogram extremes... I thought of testing it by manually tweaking feedback images, or even producing synthetic ones to see what happens, but didn't have time to do it yet.
Yes, something like that would be awesome to have. I think once we have one or a few reproducible cases, it should be possible to get to the bottom of this and find a fix. I’ll try to find some time to investigate in the next few days.
(WARNING: super long post)
I think i might be onto something... I used this as a dumb, simple testbed for a synthetic feedback:
I'm only using positive prompts, and only positive feedback to simplify things. The first thing to confirm was, how easy simple properties of the feedback image, like saturation, contrast and hue, transfer onto the generated images. So i took that image, created higher contrast version, lower contrast version, higher saturation version, lower saturation version, and shifted hue version and ran all of the variants with same seeds.
Original batch:
Lower contrast feedback:
Higher contrast feedback:
Lower saturation:
Higher saturation:
Hue shift:
The conclusion - changes in hue, contrast, saturation VERY easily transfer from feedback images to generated images. And higher values seem to be especially sticky. So it's kind of obvious now why sometimes generations drift off into full on burnout.
Next thing i wanted to test is - how much does similarity (similar content) between the feedback image and images generated by the prompt affect, so i fabricated a feedback image that's just a circle on a background using the same dominant colors. Kind of an extreme simplification of the images generated:
This is the result with it as feedback:
Lowered contrast:
Increased contrast:
Desaturated:
Saturated:
Hue:
Sort of clearly evident most of the effect is on the sky and grass - i.e. areas that match by complexity / surface area. So, in conclusion - global properties like contrast, hue, saturation, etc. easily transfer to the generated images, and the effect is progressively stronger the higher the similarity between the subjects or elements.
What to do about it? No clue at the moment...
Another unrelated thing i've noticed by accident, that i cannot quite understand... Using images generated by model A as feedback for model B often times (always?) results in glitches and complete gibberish. Talking "related" models here, so e.g. SD1.5 and fine tuned SD1.5.
Very interesting, thanks for putting this together! I’ve definitely seen even more extreme examples where patches of the image are completely black or where the brightness is so low that the entire image is essentially black. I haven’t really thought about it in terms of contrast.
As far as finding a solution goes, I think it will require looking for a pattern in the model activations that’s different between burned-out images and normal images. It might also help finding more extreme examples because there it might be more obvious where exactly it goes wrong.
I’ve also never noticed that using different models for generating feedback results in glitches. Do you have some examples for that? It might well be related to the burnout issue.
Yeah, these were very mild examples, but also, it didn't take a lot to produce them. I've seen complete burnouts too.
I'll try and make different model glitch repro case too.
What i think might maybe help is bounding the maximum weights of the feedback to some range. Question is what that range should be... i think i've seen a similar issue way back a year or so ago in the early days of SD when someone was trying to prevent the image "burn" at high CFGs. It turned out that it's not as easy as it initially sounds, because there's no good way to map latent space value ranges to image value ranges (i.e. 0-255) across different generations.
I've been able to find a pair of feedback images that reliably produces burnout, so I'm posting them here for future reference. Some observations I've been able to make so far:
award winning oil painting of a corgi sitting in a garden, soothing tones, colorful, high contrast, (masterpiece), (best quality:1.2), absurdres, intricate details, wide angle, in the style of monet
black background
Feedback image 1:
Feedback image 2:
Example generation with/without feedback:
Interesting... The pattern visible in the worst examples is similar to what you sometimes get with img2img when accidentally using the same seed on both the original image and the current generation.
Oh interesting, this might be an indication that predicted noise updates are too large or otherwise OOD. Not very familiar with the img2img failure mode, but do you know if adaptive CFG scale and/or more denoising steps help at all?
They usually just make the pattern less/more obvious. I have no idea what causes it, but i think it's probably reinforcement of existing noise in some way.
As in, any leftover noise is very similar and gets signal-boosted through repetition.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
It's been a while, but I've finally found a way to mitigate this issue and mostly prevent it from happening.
I seem to be getting horribly burnt out images no matter what i do. I've tried eliminating all extensions, i tried custom (non anime) and stock VAE, i'm not using any hypernetworks or LoRAs. I tried with my own fine-tuned (generalist) models as well as stock SD1.5 and 2.0.
I first thought that using too many upvote/downvote images causes burnout, which would make sense - eventually, one would expect the effect to be somewhat similar to neuron death or divergence with training gone wrong or just overtraining.
However, it can happen even with just one upvote or downvote image too. Generating upvote/downvote images with one model, then applying them to another model seems to make it happen less. Absolutely worst cases are >3 images generated with the same model and prompt they are applied to.
What am i doing wrong / is there an expected procedure to this? Or is there a bug somewhere / lack of normalization of some kind leading to unet weights blowing up?