Suggestion: add new, less aggressive, sampler with ToMe

recoilme commented 1 year ago

So. Now Tome on the one hand behaves somewhat aggressively, throwing out a significant portion of tokens not directly related to the main composition. This leads to the following disadvantages:

A significant portion of the parts are being washed out. Example: without tome, tome 0.5, tome 0.3

But there are pluses as well:

increased generation speed
allocation of the kernel of the composition

If you reduce the number of steps, we often lose the essence (because we try to generate everything at once). And this is where ToMe can help us. At the early stages of generation, using tome will reduce the amount of noise and slightly increase the speed. I assume that it will lead to a more coherent and high-quality composition as a whole. Fewer tokens means better composition.

Let's say I generate with DPM++ SDE Karass with 30 steps. Now the tome is used in the whole generation process. My propose to use it exponentially (if possible), for example, at steps:

1 - 5 - .75 compression
6 - 10 - .5 compression
10 - 15 - 0.25 compression
16 - 30 - 0 compression

This will allow you to focus on the main part of the composition in the early stages, and increase the quality and amount of detail in the later stages.

In my opinion, a more correct method would be to add Tome as a new Sampler, ideally in DPM++ SDE Karass + tome (it is the slowest and highest quality sampler, by my taste).

What do you think?

dbolya commented 1 year ago

Hi, thanks for the suggestion!

I actually tried something similar in the paper:

It did improve the quantitative numbers somewhat, but I found the small gain in FID was not worth the headache it is to implement. We'd need to know the current diffusion step and the total number of diffusion steps inside the model, which is not really something we have access to (unless I was missing a simple way to query that information). Thus, I omitted it from this version of the code.

Now, it could be that the results are better when used on big images (I only tested 512x512 images for that experiment). So maybe that's something to test?

Another idea is to not apply the same amount of merging to every layer. For instance, there are 4 layers ToMe is applied to right now. What if we applied more on the first layers and less on the later layers?

recoilme commented 1 year ago

Hi, thanks for response! Ok, at first of all we need good samples for test.

For nice pictures we need:

two good different type models (photorealistic, illustration)
right prompts (with shadows, details and so on)
dimension (SD trained on 512512, so we need little more, 640640 will be enougth)
sampler (DPM++ SDE Karass, 20 steps)

Second part, about steps. I suggest to try to make a tome optimization inside the sampler (sampler know step i think), for example DPM++ SDE Karras. Try create a new sampler, with tome?

I will try to find more details about samplers and add minimal prompts examples

recoilme commented 1 year ago

Prompts examples.

Minimal negative prompt is: (low quality:1.4), (worst quality:1.4)

Positive prompts must contains something like this:

highres, masterpiece, perfect lighting, bloom, cinematic lighting, <SOME CONTENT>,(masterpiece:1.3), (best_quality:1.3), (ultra_detailed:1.3), 8k, extremely_clear, realism, (ultrarealistic:1.3)

SOME CONTENT example: lion, whale, seashell, coral, clownfish, octopus

Example (lion): highres, masterpiece, perfect lighting, bloom, cinematic lighting, lion ,(masterpiece:1.3), (best_quality:1.3), (ultra_detailed:1.3), 8k, extremely_clear, realism

Animatrix image (illustration model): 00013-0

Colorful image (photorealism): 00014-0

Steps - around 25

recoilme commented 1 year ago

Samplers

Example of patched sampler: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8457

Github with samplers: https://github.com/crowsonkb/k-diffusion (Katherine Crowson, @crowsonkb - developer of most famous diffusion samplers, including DPM++ SDE Karras)

May be she may get some suggestions (i'm not python dev. I just want to generate waifu)

recoilme commented 1 year ago

And last, about "Another idea is to not apply the same amount of merging to every layer. For instance, there are 4 layers ToMe is applied to right now. What if we applied more on the first layers and less on the later layers?"

Is it about try to play with different max_downsample?

dbolya / tomesd

Suggestion: add new, less aggressive, sampler with ToMe #28