dvruette / sd-webui-fabric

MIT License
403 stars 23 forks source link

Burnout / neuron death #34

Closed orcinus closed 7 months ago

orcinus commented 1 year ago

I seem to be getting horribly burnt out images no matter what i do. I've tried eliminating all extensions, i tried custom (non anime) and stock VAE, i'm not using any hypernetworks or LoRAs. I tried with my own fine-tuned (generalist) models as well as stock SD1.5 and 2.0.

I first thought that using too many upvote/downvote images causes burnout, which would make sense - eventually, one would expect the effect to be somewhat similar to neuron death or divergence with training gone wrong or just overtraining.

However, it can happen even with just one upvote or downvote image too. Generating upvote/downvote images with one model, then applying them to another model seems to make it happen less. Absolutely worst cases are >3 images generated with the same model and prompt they are applied to.

What am i doing wrong / is there an expected procedure to this? Or is there a bug somewhere / lack of normalization of some kind leading to unet weights blowing up?

dvruette commented 1 year ago

I’ve run into similar problems myself before, but it’s kind of difficult to reproduce. Do you have some example feedback for which this reliably happens? Or some generation parameters to reproduce the behavior?

orcinus commented 1 year ago

Unfortunately, no, that's part of the problem - it appears very randomly. I'll try and create some kind of a repro case.

I strongly suspect it happens when the generated images are increasingly more similar to the feedback images. Essentially a feedback loop that leads to unet neuron death.

dvruette commented 1 year ago

Yes, on one hand that sounds plausible, but on the other hand I’ve tested the plugin by replicating one image ~10 times without the model burning out. I suspect it has something to do with specific features in the image. For example dark patches with one solid color could be disproportionally influential on the generated image.

orcinus commented 12 months ago

Yeah, i had that thought too, but sometimes it's a dark or contrasty image that causes this, and sometimes a very non-contrasty flat one, without histogram extremes... I thought of testing it by manually tweaking feedback images, or even producing synthetic ones to see what happens, but didn't have time to do it yet.

dvruette commented 12 months ago

Yes, something like that would be awesome to have. I think once we have one or a few reproducible cases, it should be possible to get to the bottom of this and find a fix. I’ll try to find some time to investigate in the next few days.

orcinus commented 12 months ago

(WARNING: super long post)

I think i might be onto something... I used this as a dumb, simple testbed for a synthetic feedback:

FABRIC_orig

I'm only using positive prompts, and only positive feedback to simplify things. The first thing to confirm was, how easy simple properties of the feedback image, like saturation, contrast and hue, transfer onto the generated images. So i took that image, created higher contrast version, lower contrast version, higher saturation version, lower saturation version, and shifted hue version and ran all of the variants with same seeds.

Original batch:

FABRIC_result_orig

Lower contrast feedback:

FABRIC_result_decontrast

Higher contrast feedback:

FABRIC_result_contrast

Lower saturation:

FABRIC_result_desaturate

Higher saturation:

FABRIC_result_saturate

Hue shift:

FABRIC_result_hueshift

The conclusion - changes in hue, contrast, saturation VERY easily transfer from feedback images to generated images. And higher values seem to be especially sticky. So it's kind of obvious now why sometimes generations drift off into full on burnout.

Next thing i wanted to test is - how much does similarity (similar content) between the feedback image and images generated by the prompt affect, so i fabricated a feedback image that's just a circle on a background using the same dominant colors. Kind of an extreme simplification of the images generated:

FABRIC_fabricated1

This is the result with it as feedback:

fabricated_result_original

Lowered contrast:

fabricated_result_decontrast

Increased contrast:

fabricated_result_contrast

Desaturated:

fabricated_result_desaturated

Saturated:

fabricated_result_saturated

Hue:

fabricated_result_hue

Sort of clearly evident most of the effect is on the sky and grass - i.e. areas that match by complexity / surface area. So, in conclusion - global properties like contrast, hue, saturation, etc. easily transfer to the generated images, and the effect is progressively stronger the higher the similarity between the subjects or elements.

What to do about it? No clue at the moment...

orcinus commented 12 months ago

Another unrelated thing i've noticed by accident, that i cannot quite understand... Using images generated by model A as feedback for model B often times (always?) results in glitches and complete gibberish. Talking "related" models here, so e.g. SD1.5 and fine tuned SD1.5.

dvruette commented 12 months ago

Very interesting, thanks for putting this together! I’ve definitely seen even more extreme examples where patches of the image are completely black or where the brightness is so low that the entire image is essentially black. I haven’t really thought about it in terms of contrast.

As far as finding a solution goes, I think it will require looking for a pattern in the model activations that’s different between burned-out images and normal images. It might also help finding more extreme examples because there it might be more obvious where exactly it goes wrong.

I’ve also never noticed that using different models for generating feedback results in glitches. Do you have some examples for that? It might well be related to the burnout issue.

orcinus commented 12 months ago

Yeah, these were very mild examples, but also, it didn't take a lot to produce them. I've seen complete burnouts too.

I'll try and make different model glitch repro case too.

What i think might maybe help is bounding the maximum weights of the feedback to some range. Question is what that range should be... i think i've seen a similar issue way back a year or so ago in the early days of SD when someone was trying to prevent the image "burn" at high CFGs. It turned out that it's not as easy as it initially sounds, because there's no good way to map latent space value ranges to image value ranges (i.e. 0-255) across different generations.

dvruette commented 11 months ago

I've been able to find a pair of feedback images that reliably produces burnout, so I'm posting them here for future reference. Some observations I've been able to make so far:

Feedback image 1: 00063-948568555

Feedback image 2: 00074-948568555

Example generation with/without feedback: grid-0031 grid-0030

orcinus commented 11 months ago

Interesting... The pattern visible in the worst examples is similar to what you sometimes get with img2img when accidentally using the same seed on both the original image and the current generation.

dvruette commented 11 months ago

Oh interesting, this might be an indication that predicted noise updates are too large or otherwise OOD. Not very familiar with the img2img failure mode, but do you know if adaptive CFG scale and/or more denoising steps help at all?

orcinus commented 11 months ago

They usually just make the pattern less/more obvious. I have no idea what causes it, but i think it's probably reinforcement of existing noise in some way.

As in, any leftover noise is very similar and gets signal-boosted through repetition.

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

dvruette commented 7 months ago

It's been a while, but I've finally found a way to mitigate this issue and mostly prevent it from happening.