bmaltais / kohya_ss

Apache License 2.0
8.79k stars 1.14k forks source link

how to use Accelerate Lora tab to select which gpu ID to use #2577

Closed rafstahelin closed 1 week ago

rafstahelin commented 4 weeks ago

I often train 4 gpu's on runpod. I usualy use the command line CUDA_VISIBLE_DEVICES=0,1,2,... etc

Can anyone direct me to the GUI tab instead? image

I've tried it but get a warning when i set it to one of the available gpu's that I have selected multi=GPU, when i fact i just want to run one kohya process per gpu

Anyone?

image

b-fission commented 3 weeks ago

You say you selected the Multi-GPU checkbox? Just unselect it, and it should be able to launch with the given GPU IDs in the CUDA_VISIBLE_DEVICES var for you. If it's already unselected, it sounds like the option might be turned on elsewhere like the default config for accelerate.

rafstahelin commented 3 weeks ago

Makes sense. Will try now. BTW, do you know a command to follow each gpu individual process as when I launch the CUDA process in activated env to monitor the steps?

b-fission commented 3 weeks ago

I don't know of any easy way to track multiple training processes.

Maybe you could launch several instances of kohya gui in separate terminal shells and note which port numbers they use (7860, 7861, etc), then launch training from each gui session, and observe from each terminal.

But if you're manually running the training commands from a shell instead of the web gui, have you tried using multiplexers like screen or tmux? It'd allow you to have split-screen terminals without requiring a full desktop session or window manager.

rafstahelin commented 3 weeks ago

regarding this, would i expose HTTP ports in Pod Edit, and then assign each one to an Accelerate process? And then Open the port and tail the logs? I think I understand

On Mon, Jun 10, 2024 at 7:09 PM b-fission @.***> wrote:

I don't know of any easy way to track multiple training processes.

Maybe you could launch several instances of kohya gui in separate terminal shells and note which port numbers they use (7860, 7861, etc), then launch training from each gui session, and observe from each terminal.

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2577#issuecomment-2158891161, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNXTFXEI35SKWWRULEMDUTZGXMTDAVCNFSM6AAAAABJAC2C4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYHA4TCMJWGE . You are receiving this because you authored the thread.Message ID: @.***>

-- thanks

best regards, raf

rafstahelin commented 3 weeks ago

I unchecked Multi GPU and selected GPU IDs 1 (i have two gpu's currently on runpod, so 0 and 1)

But I get the following message, that doesnt make sense:

The following values were not passed to `accelerate launch` and had defaults used instead:
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

image

This worked correctly. It automatically assigned GPU=1 instead of 0

But to follow the exposed port I suppose I would add the port on Main process Port.

How I would then open this port's terminal is my only question

b-fission commented 3 weeks ago

But I get the following message, that doesnt make sense:

The following values were not passed to `accelerate launch` and had defaults used instead:
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

Did you set the "Number of processes" option to 1?

accel

But to follow the exposed port I suppose I would add the port on Main process Port.

How I would then open this port's terminal is my only question

What are you using to get a terminal, is it SSH? You should be able to open more SSH sessions and start additional kohya gui instances (without needing to change any server configs). That's essentially what I suggested earlier with multiple ports and terminal shells.

rafstahelin commented 3 weeks ago

As far as I know, I’m opening each GUI by connecting to port 3000 which opens 3001. I’m guessing it’s just that simple. Just open new GUI’s and set the accelerate. But indeed how then for new terminal for that instance. I’m basically on Jupyter on port 8888Best, Raf On 12 Jun 2024, at 19:26, b-fission @.***> wrote:

But I get the following message, that doesnt make sense: The following values were not passed to accelerate launch and had defaults used instead: More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

Did you set the "Number of processes" option to 1? accel.png (view on web)

But to follow the exposed port I suppose I would add the port on Main process Port. How I would then open this port's terminal is my only question

What are you using to get a terminal, is it SSH? You should be able to open more SSH sessions and start additional kohya gui instances (without needing to change any server configs). That's essentially what I suggested earlier with multiple ports and terminal shells.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

b-fission commented 3 weeks ago

I think you've got the right idea. Each kohya instance gives a gui on a new port number (and URL) which is listed on the log output.

I don't know how it behaves on runpod (haven't used it) but if I run a local instance of the gui, the gradio port number would start at 7860. Running more instances would bump it to 7861 and 7862 in a new URL, all of which I could access directly in my web browser.

Using shell=True when running external commands...
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://_______.gradio.live

Now if you're talking about what do with the "Main process port" option, you can ignore that. It's used for training on multiple GPUs across different machines, which is not what we're doing here.

rafstahelin commented 3 weeks ago

i see what you mean about starting new kohya process locally which bumps the port.

But on runpod, the port I am not sure how to instance new kohya processes. When I click on Port 3000 for kohya I only get a this: https://dyx9iwwjjtstga-3000.proxy.runpod.net/ [image: image.png]

I dont know how to see the port for each gpu

This is the jupyterlabs terminal: [image: image.png]

On Wed, Jun 12, 2024 at 7:59 PM b-fission @.***> wrote:

I think you've go the right idea. Each kohya instance gives a gui on a new port number (and URL) which is listed on the log output.

I don't know how it behaves on runpod (haven't used it) but if I run a local instance of the gui, the gradio port number would start at 7680. Running more instances would bump it to 7861 and 7862 in a new URL, all of which I could access directly in my web browser.

Using shell=True when running external commands... IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.

Running on local URL: http://127.0.0.1:7860 Running on public URL: https://_______.gradio.live

Now if you're talking about what do with the "Main process port" option, you can ignore that. It's used for training on multiple GPUs on different machines, which is not what we're doing here.

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2577#issuecomment-2163615564, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNXTFR7A3VV5SVSXOA5UQLZHCD7BAVCNFSM6AAAAABJAC2C4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGYYTKNJWGQ . You are receiving this because you authored the thread.Message ID: @.***>

-- thanks

best regards, raf

rafstahelin commented 3 weeks ago

Ok got the solution via the bro's on Runpod:

cd /workspace/kohya_ss nohup ./gui.sh --listen 0.0.0.0 --server_port 3001 --headless > /workspace/logs/kohya_ss_port3001.log 2>&1 &

which exposes the port and creates a log for the port

then i select this port 3001 for kohya instance and select gpu id 0,1,...

Just unclear about this Main nProces Port [image: image.png] But working otherwise! Stoked!

On Thu, Jun 13, 2024 at 12:12 PM raf | raf.studio @.***> wrote:

i see what you mean about starting new kohya process locally which bumps the port.

But on runpod, the port I am not sure how to instance new kohya processes. When I click on Port 3000 for kohya I only get a this: https://dyx9iwwjjtstga-3000.proxy.runpod.net/ https://dyx9iwwjjtstga-3000.proxy.runpod.net/ [image: image.png]

I dont know how to see the port for each gpu

This is the jupyterlabs terminal: [image: image.png]

On Wed, Jun 12, 2024 at 7:59 PM b-fission @.***> wrote:

I think you've go the right idea. Each kohya instance gives a gui on a new port number (and URL) which is listed on the log output.

I don't know how it behaves on runpod (haven't used it) but if I run a local instance of the gui, the gradio port number would start at 7680. Running more instances would bump it to 7861 and 7862 in a new URL, all of which I could access directly in my web browser.

Using shell=True when running external commands... IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.

Running on local URL: http://127.0.0.1:7860 Running on public URL: https://_______.gradio.live

Now if you're talking about what do with the "Main process port" option, you can ignore that. It's used for training on multiple GPUs on different machines, which is not what we're doing here.

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2577#issuecomment-2163615564, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNXTFR7A3VV5SVSXOA5UQLZHCD7BAVCNFSM6AAAAABJAC2C4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGYYTKNJWGQ . You are receiving this because you authored the thread.Message ID: @.***>

-- thanks

best regards, raf

-- thanks

best regards, raf