SDXL Dreambooth (not LoRA) NaN detected in latents

bmaltais / kohya_ss

Apache License 2.0

9.66k stars 1.24k forks source link

SDXL Dreambooth (not LoRA) NaN detected in latents #2752

Open hypervoxel opened 2 months ago

hypervoxel commented 2 months ago

Hello, I am able to train Lora successfully on RTX4090 using the "no half vae" checkbox. However if I try to train a Dreambooth checkpoint (not Lora), I get a "NaN detected in latents" error. There is no "no half vae" checkbox in the Dreambooth settings. I have tried adding sdxl_no_half_vae = true to my config.toml and it has no effect Can anybody help?

b-fission commented 2 months ago

Try putting --no_half_vae on Advanced->Additional Parameters. You shouldn't need to edit the config toml file unless you're reusing the same one for each training run.

hypervoxel commented 2 months ago

Try putting --no_half_vae on Advanced->Additional Parameters. You shouldn't need to edit the config toml file unless you're reusing the same one for each training run.

Can you clarify? There is no "no half vae" checkbox in the gui for Dreambooth (not LoRA). How do I put --no_half_vae on Advanced->Additional Parameters?

b-fission commented 2 months ago

Manually put the text in the box like this...

opt

hypervoxel commented 2 months ago

oh nice! Thank you!! I'll give it a try!

hypervoxel commented 2 months ago

Try putting --no_half_vae on Advanced->Additional Parameters. You shouldn't need to edit the config toml file unless you're reusing the same one for each training run.

This works! However it is very slow. With AdamW8bit is was 120s/it, with Adafactor it is 62s/it. But still far too slow to use. Does anyone have good setting for SDXL Dreambooth checkpoint training?

b-fission commented 2 months ago

Turn on Gradient Checkpointing and Full fp16 training

hypervoxel commented 2 months ago

Turn on Gradient Checkpointing and Full fp16 training

That worked! Getting 2it/s Thank you so much!!!