How to train LORA with multiple GPUs

martindellavecchia commented 3 months ago

After resolving the no avx support of my GPU here: https://github.com/bmaltais/kohya_ss/issues/2582, thanks @b-fission I went ahead and kicked off my lora training, it started training using just one of my two available GPUs.

I've tried running the seup.bat and configured the accelerator specificing to use all the available GPUs, but it doesn't fix it.

Then I went to the web interface, I marked "multi GPU", i selected to run two processes but nothing. I got

11:10:46-280029 INFO Command executed. [2024-06-10 11:10:52,058] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-413GD2B]:29500 (system error: 10049 - La direcci¾n solicitada no es vßlida en este contexto.).

I read over the internet that I need to update my gui.bat with "set CUDA_VISIBLE_DEVICES=1", which I did but it kept training my lora using just 1 gpu

Any guide on how to properly tell kohya to use my 2080ti and the 3060 connected?. Kohya sees them present in the system.

Thanks so much !

b-fission commented 3 months ago

Are you training an SDXL model?
What is your batch size? Does using 2 or 3 affect GPU usage?
Do the options --ddp_gradient_as_bucket_view and --ddp_static_graph affect GPU usage?

Also, you should not need to set CUDA_VISIBLE_DEVICES unless you want to hide the other GPUs from training.

That variable takes a comma-separated list of GPU index numbers (starting at zero). Since your system has three GPUs, you could specify 0,1,2 to enable all three. Or some subset of that such as 0,1 or 0,2 to use two GPUs on a process. Using 1 will restrict it to using only the GPU id 1.

martindellavecchia commented 3 months ago

Hi @b-fission , thanks for your quick turn around. In reply to your questions...

I am training on SD1.5 - a LORA
I am setting a batch side of 4 with an epoc of 75
I am not aware of using --ddp_gradient_as_bucket_view and --ddp_static_graph (where should I set them)?

In regards to CUDA_VISIBLE_DEVICES, I am not using it, i've noticed that adding it to enable the three CPUs didn't cause to distribute load among them, it just make them visibile to kohya

So far I don't find the way to "distribute" load between them.

b-fission commented 3 months ago

I am training on SD1.5 - a LORA

I don't know if multi-GPU training is supported for SD 1.5, since most discussions I've found have talked about SDXL.

using --ddp_gradient_as_bucket_view and --ddp_static_graph (where should I set them)?

There's a textbox labeled "Additional parameters" where you can paste the option. Start with --ddp_gradient_as_bucket_view first and see if that does anything.

params

So far I don't find the way to "distribute" load between them.

If distributed training doesn't work for SD 1.5, you could certainly try to train 3 Loras simultaneously instead 😉

martindellavecchia commented 3 months ago

Still no luck, nothing changed using those parameters, I'll try to train the lora using SDXL

martindellavecchia commented 3 months ago

I am looking to improve the rendering capacity of my rig, i am kicked out from train models due to insufficient capacity, 11gb. I thought maybe there was a way to dristribute the load between de 3060 and the 2080 ti

lowrankin commented 3 months ago

I am looking to improve the rendering capacity of my rig, i am kicked out from train models due to insufficient capacity, 11gb. I thought maybe there was a way to dristribute the load between de 3060 and the 2080 ti

This is not (unfortunately) how distributed training works. if training takes 12GB of vram, 2x 8GB cards do not combine to create a pool of 16GB, instead kohya tries to run 2x 12GB processes on each of the 8GB cards (in my example). if you try to add an insufficient card for the training setup, it will fail everytime.

bmaltais / kohya_ss

How to train LORA with multiple GPUs #2586