Closed TinyForge closed 2 months ago
Look like a multi-guy specific issue. Does it work if you train on a single GPU? Multi-gpu has never worked well on windows. You need Linux for it.
Swapping to 1 GPU has resolved the error. I had no idea about multi-GPU support not being good on Windows. I'll load this up WSL in the future and give it a try. Marking this as resolved, thanks.
Environment
Description I'm trying to train a LoRA for the Flux model using the kohya_ss repository (flux branch). When running the training script, I encounter the following error:
This error occurs when the accelerate library attempts to launch the script. Full error log
Steps to reproduce
What I've tried
Verified PyTorch installation:
Output:
Reinstalled PyTorch and torchvision with CUDA support:
Updated accelerate:
Ran accelerate config and chose 'NO' for using PyTorch's built-in distributed module
Verified xformers installation:
Checked for conflicts with pip list
And here is a dump of my training parameters:
Any help in resolving this issue would really be appreciated. Let me know if you need any more information.