dribnet / clipit

CLIP + VQGAN / PixelDraw
Other
282 stars 87 forks source link

Best Optimiser for generation #7

Closed varkarrus closed 3 years ago

varkarrus commented 3 years ago

After some experimentation, I've found that the best optimiser to use (at least, for standard VQGAN, not pixel or clipdraw) is DiffGrad, with a step size somewhere around 1. It gives a perfect balance of structure and detail and very rarely screws up!

Prompt: A village inside of a sewer, inhabited by humanoid tardigrades. 8K HD detailed Wallpaper, digital illustration, artstation. image Ran for 500 iterations, using sflckr.

dribnet commented 3 years ago

Thanks. I'd be interested in hearing if other people have also had luck with these settings, and if so perhaps I'll make these the default.

varkarrus commented 3 years ago

Actually, I've got more to add now.

What I do now is have three MSE Equalization epochs. I start by doing 3 sets of 100 iterations, using Adagrad at step size 2. During this time, init_weight is set to 3.75, then 2.5, then 1.25 (should be decreased if less than three CLIP models are used) After that, it goes onto DiffGrad as normal. At the end of each epoch, the optimizer is reset – essentially the same as if it were starting from scratch using the previous iteration as an init_image. I had to really hack together some crappy code to do this without reloading the entire VQGAN model each time. The end result is that it keeps the model volatile early on, preventing it from blocking out anything in too much detail, and improving structure.

The concept was taken from this (poorly markdowned) colab notebook: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj?usp=sharing

image The Cookie Monster as a monstrous beast. 8K HD Eldritch Horror Art Nouveau. Artstation.

dribnet commented 3 years ago

Thanks - these experiments are super awesome.

I think what I'd like to do is create a branch with a slightly stronger baseline for learning_rate and with the code refactored making it easier for your to try your own settings. Then in a future release, we can offer these settings via an option (or perhaps make them the default if they are popular enough).

If this sounds good to you let me know. Need to first take care of #6 #9 and #10 and then I can make a branch downstream with these changes for you to test / tweak.

dribnet commented 3 years ago

I have now pushed a new branch learning_rates which has been refactored to allow experimentation with different learning rate schemes. I've also implemented early stopping in this branch, which I am excited about. Currently in this branch:

Additionally:

This provides a pretty strong baseline for me. Now I propose for testing we use reference prompts and reference settings and then report loss and iterations. For example, here's my "The Cookie Monster as a monstrous beast. 8K HD Eldritch Horror Art Nouveau. Artstation."

Setting Iterations Loss
--quality better --aspect square 319 1.822
--quality best --aspect square 247 1.626

So the goal is to either reliably get a lower loss or perhaps the same loss in fewer iterations. We can of course add more reference prompts and settings if we want. And for the curious, here's my versions of those images.

cookie_11

cookie_10

@varkarrus - if you have a chance check out this branch and maybe have a go at implementing your schemes and seeing if you get better scores. I think before I release this I'd want to clean up the options to make early stopping an option, etc. but this should be good enough for testing and is already refactored in ways to make resetting the optimisers, etc. easier.

varkarrus commented 3 years ago

Yo that looks sick! I'll take a look at this branch, but I should mention I can't get Best quality working on colab. Runs OOM even on a V-100... :(

dribnet commented 3 years ago

I'm going to go ahead and close the issue here as i have recently moved the core library to the pixray repo. Note that this move also including going live with learning rate decay. For now early stopping is disabled, and so decay is instead on a fixed schedule with drops configured with learning_rate_drops which defaults to 80% and 90% of the iterations. So by default 80% of the way through the run the learning rate drops by 1/10 and then at 90% it drops again by 1/10.

So if you'd like to try this again just open an issue the pixray repo and the code there should be much easier to experiment with as the optimisers have already be refactored so that they can be reset, etc. during the run.