Jack000 / glid-3-xl-stable

stable diffusion training
MIT License
290 stars 36 forks source link

some questions about image size #6

Closed AlvL1225 closed 1 year ago

AlvL1225 commented 2 years ago

Hi Jack, Resolutions for SD official V1.4 (V1.3) model are 512 for kl and 64 for diffusion model but your Readme settings are 256 for kl, and 32 for diffusion model.

Can the two resolution settings match? Thanks!

Jack000 commented 2 years ago

for sampling, 64 and 32 are identical as long as the attention_resolutions are shifted.

32px with attention resolutions 32,16,8 = 64px with attention resolutions 64,32,16

I used 32px mostly because it matches the settings in the Compvis code.

the KL-VAE is (presumably) trained at 256px. It can be used for any resolution >= 256px (smaller images may have issues, based on my testing)

the image size for the training code is set with --actual_image_size - and yes, to match SD it should be set to 512 (all other flags can stay the same). I should probably change this.

AlvL1225 commented 2 years ago

Thanks!

I have another question. On what device do you test and how is the speed? (with 256 and 32 settings)

Do you think if I can train on a 24GB device (or multiple 3090s) with half the batch size per gpu?

Jack000 commented 2 years ago

Sampling and training speed should be identical to the Compvis repo, it's basically the same code.

I don't think it would be possible to train on a 3090 with this code. It just only off by a little bit though, so it might be possible with a bit of work - disable ema and change the training script to pre-compute the LDM embeddings and store on disk.

A better solution might be to use deepspeed cpu offloading similar to this repo: https://github.com/afiaka87/latent-diffusion-deepspeed