[Feature Request]: Idle mode to save power consumption

echoidcf commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

After webui is launched, a model is loaded to VRAM and then the whole GPU switch to P0 mode, which is performance mode that consume lots of power even you do nothing. (50w to 70w for me)

It is not suitable for 7x24 running on my server, I have to manually shut it down after using it, and bring it up before I next use.

Is there a option to just unload everything and just standby, I know this will cause a overheader to load model again, but much better than a 50-70w idle power cost especially for 7x24 usage.

Proposed workflow

start webui.sh with a special option, and webui will automatic unload everything after generate the image. or even better, with a special option, webui will unload everything after a while, and automatic load the model before next usage.

Additional information

No response

BretG137 commented 1 year ago

I came here with the same questions. It would be nice if there was a way to unload the model from the webui, similar to oobabooga.

missionfloyd commented 1 year ago

@BretG137 There is, it's in Settings > Actions. It can also be done with the API.

w-e-w commented 1 year ago

setting > actions > Unload SD checkpoint to free VRAM

BretG137 commented 1 year ago

@BretG137 There is, it's in Settings > Actions. It can also be done with the API.

Oh I see, thank you! It actually doesn't lower the clocks which is what I think the OP was asking but I suppose that is not easily possible.

echoidcf commented 1 year ago

@BretG137 There is, it's in Settings > Actions. It can also be done with the API.

I tried, which is not actully changing anything. This is before webui started.

Mon Apr 24 01:17:11 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                       On | 00000000:01:00.0 Off |                  Off |
| N/A   26C    P8                9W / 125W|      0MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

webui started, but standby.

Mon Apr 24 01:19:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                       On | 00000000:01:00.0 Off |                  Off |
| N/A   33C    P0               50W / 125W|   2596MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1816701      C   python3                                    2594MiB |
+---------------------------------------------------------------------------------------+

unload the model and checkpoint

Mon Apr 24 01:21:51 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                       On | 00000000:01:00.0 Off |                  Off |
| N/A   38C    P0               51W / 125W|    514MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1816701      C   python3                                     512MiB |
+---------------------------------------------------------------------------------------+

you can see, after unload the model, VRAM is freed, but GPU is still working at P0 state, 51w. I have to totally shutdown the process, and it will back to normal.

w-e-w commented 1 year ago

I think it has more to do with how your GPU is configured and different cars have different default power states for example my 3090 at idle whether or not webui is loaded as long as it's not doing actual work it stays at P8 (24W) when doing work it goes up to P2 (350W)

not sure but I'm think it's possible to use nvidia-smi to power limit the GPU I read somewhere depending on the clock speed it was set to your GPU into different Power States seems like these might be of use --power-limit --cuda-clocks note: I have not researched this in detail, and don't know whether or not modifying these settings would cause other issues so if you do decide to play around with these settings do it at your own risk

echoidcf commented 1 year ago

I think it has more to do with how your GPU is configured and different cars have different default power states for example my 3090 at idle whether or not webui is loaded as long as it's not doing actual work it stays at P8 (24W) when doing work it goes up to P2 (350W)

not sure but I'm think it's possible to use nvidia-smi to power limit the GPU I read somewhere depending on the clock speed it was set to your GPU into different Power States seems like these might be of use --power-limit --cuda-clocks note: I have not researched this in detail, and don't know whether or not modifying these settings would cause other issues so if you do decide to play around with these settings do it at your own risk

It may depends on the driver or os or configuration or something like that. But an option that can unload everything will surely be a solution for everyone and every configuration.

w-e-w commented 1 year ago

I don't think having this inside web UI is easy, I bet it could be done but it will take lots of work, even then it will be hard to account for extensions

I think the best way to approach is to have some sort of proxy server that monitors incoming traffic and usage if there's been coming traffic then start the webui there's no incoming traffic or web server is not running a job for a period of time then kill webui

echoidcf commented 1 year ago

I don't think having this inside web UI is easy, I bet it could be done but it will take lots of work, even then it will be hard to account for extensions

I think the best way to approach is to have some sort of proxy server that monitors incoming traffic and usage if there's been coming traffic then start the webui there's no incoming traffic or web server is not running a job for a period of time then kill webui

I think this is quite easy to do. I believe just free some object will just do the trick. A fastapi server wont trigger the GPU. I will look into the source code when I have some time.

zsjnhfj commented 1 year ago

waiting for a good answer

BlankFX1 commented 10 months ago

Just a quick tip for users running Windows or a Windows Server. You can just use Multi Display Power Saver, a tool integrated in Nvidia Inspector. It forces your GPU to remain in it's lowest possible power state and also lets you set a usage limit before going P0 or a process white list for it. Been using it for over a decade.

ERomanchuck commented 9 months ago

Hi! How much it/sec does on p40? Does it raison to go from tesla k80? Thanks!

echoidcf commented 9 months ago

Hi! How much it/sec does on p40? Does it raison to go from tesla k80? Thanks!

p40 is almost the same as RTX1080. Not very fast, but I think it is ok. I sold my p40, now I am using 3060 12GB.

space192 commented 7 months ago

any update on the subject ?

randelreiss commented 2 months ago

Ollama framework has a really handy environment and API accessible variable:

OLLAMA_KEEP_ALIVE=[# of seconds] | [xM] | 0

I think it's mostly used for people who want the last loaded chat model to stay loaded longer. But I use it set to zero to keep the GPU VRAM as empty as possible as soon as possible. This is because I have many users that mostly use the GPU for chat and occasionally for Text-to-speech and SD image creation - loading up the GPU VRAM. Unfortunately SDWeb keeps its last model loaded indefinitely. It would be great if SDWeb had a similar Keep Alive option to let us decide how long to keep the last model loaded.

Covfefe3 commented 1 month ago

Update: Fixed after I restarted my PC. GPU goes in to idle mode even if Webui is loaded in idle mode. Also, after image generation GPU clock speed go in to idle mode as before.

Reason why I am here is because I just noticed my GPU( RTX 3060 12GB) clock staying at max even after image is generated. I have had to manually shutdown webui to bring down my clock speed. Normally, after image generation the gpu goes in to idle mode and stays there till I start a new image generation job. It has helped me keep my power consumption down and my GPU temps low. Did not know others are constantly facing this issue.

0xE1 commented 1 month ago

P40 idles at ~50W, which is 1500W per day and 438kw per year, which is around €62 at €0,14 or €131 at €0,30 instead of potential €10 or €21

AUTOMATIC1111 / stable-diffusion-webui