Implement training Diffusion With Offset Noise in order to make it generate possible to generate dark and light images easily

ProGamerGov commented 1 year ago

I recently came across an an article that helps address a major issue with training Stable Diffusion models: https://www.crosslabs.org/blog/diffusion-with-offset-noise

Basically in the training loop, using the default noise creation results in the model not being able to properly handle really bright and realy dark images:

noise = torch.randn_like(latents)

Given a standard color range of 0 for white to 1 for black, the average of the images produced by the default training setup is 0.5

In order to make it possible to reliably create images above or below the 0.5 average, the noise calculation can be changed to:

noise = torch.randn_like(latents) + 0.1 * torch.randn(latents.shape[0], latents.shape[1], 1, 1, device=latents.device)

Or:

noise = torch.randn_like(latents) + 0.1 * torch.randn(latents.shape[0], latents.shape[1], 1, 1, device=accelerator.device)

This results in the following images using the same prompts as the first example:

It looks like this change can be implemented in Stable Tuner by simply changing this line: https://github.com/devilismyfriend/StableTuner/blob/main/scripts/trainer.py#L1372

~~https://github.com/devilismyfriend/StableTuner/blob/main/scripts/trainer.py#L2724~~

But it might be better to have it as an optional feature, as it may have some drawbacks.

vurt72 commented 1 year ago

This feature does not work. Or at least it needs a guide / pop-up explanation in stabletuner of how it is supposed to be used.

Turning it to On removes the ability to recognize the token word it seems, at least the samples are no longer resembling the learning images at at all. I've tried it four times now, on and off, off always works like expected. Last try i let it go for 3h and after 50% it was still garbage, its just using images from the dataset i'm training on (v2-1_768-ema-pruned). I also tried weight 0.1 for the noise and 0.9 for the other weight, no go either.

Luke2642 commented 1 year ago

Perhaps a more flexible way to implement this would be to allow the noise scheduler to be a parameter set in the UI?

At the moment trainer.py sets one scheduler only:

noise_scheduler = DDPMScheduler.from_config(args.pretrained_model_name_or_path, subfolder="scheduler")

This offset noise along with the other papers metioned - that use blurring etc - is undoubtedly going to open up a whole new class of noise options.

And this noise offset has showed that finetuning for a few thousand steps is sufficient to teach the old model new tricks!

I wonder if a random mesh warp, or shuffling the latents, could be a really interesting noise scheduler to fine tune a model with? Along with histogram warping, rather than just offset noise.

I think it was Nvidia that showed training different models for different stages of the denoising process reduced steps and improved quality. I guess you'd train a 'heavy noise' and 'light noise' model and turn the diff from the base model into LORAs, and apply a different denoiser lora for each step. Making this stuff up as I go along though. Exciting times!

devilismyfriend / StableTuner

Implement training Diffusion With Offset Noise in order to make it generate possible to generate dark and light images easily #100