bmaltais / kohya_ss

Apache License 2.0
9.6k stars 1.24k forks source link

Latest Windows 11 update broke Kohya's VRAM usage #2704

Open Ofuld opened 2 months ago

Ofuld commented 2 months ago

I trained a LoRA yesterday on around 45 minutes. Right after that, windows asked to update. I did. Today, I go back to train another version of my LoRA, with the exact same settings, and it's taking 56 hours. The difference is the VRAM usage. Before Windows update it was a little above 10GB. Post-update it's at around 14GB. My GPU has 10GB of VRAM, so that's why I'm getting 56 hours. At least that's what I'm assuming.

Current Windows 11 version is 22631.4037.

I don't know much about machine learning or computer science to really understand what is happening, or to provide better feedback. But the only thing that changed in my system was that Windows update. I've also tried updating Kohya to the latest build, but the result is the exact same. I can't train anymore.

Ofuld commented 2 months ago

Both ComfyUI and Forge are working normally. Their VRAM usage is as it was before.

nokeyfan commented 2 months ago

If you have trained before the update, you can load the json file with the settings from before. Every time you train it creates a json file. Try that if you have an older json file. Never experienced slow speed by windows updates. Could be something unchecked like gradient checkpointing, missing optional parameter etc.

Ofuld commented 2 months ago

Already tried that, I'm afraid. I'm beyond confused! Maybe I need to update some component I have no knowledge of.