VRAM Doesn't unload after a work session (set unload delay to 5m?)

angiopteris commented 3 months ago

Hello There, As many of us may use several IA tools it is very important to free VRAM when not in use so that the model could still be loaded from RAM quickly without having to sit on VRAM. Is there an option to use ?

liusida commented 3 months ago

After a little bit digging, I found this function is all you need to release all the resources.

https://github.com/comfyanonymous/ComfyUI/blob/30abc324c2f73e6b648093ccd4741dece20be1e5/comfy/model_management.py#L842

but it's still a question how to call this function in UI.

liusida commented 3 months ago

When a user close the window, the websocket connection will be closed, and this line will be reached: https://github.com/comfyanonymous/ComfyUI/blob/30abc324c2f73e6b648093ccd4741dece20be1e5/server.py#L117

But I don't see any clean up for that user, so I assume the functionality isn't there?

Also I don't think the system knows which vram is used by which session, so.... it's not that easy to add this functionality. Any suggestion?

I think a dumb way to do that is to make a custom node (plugin) that adds a route, say /unload_all_models, so that if you visit that URL, all vram will be freed.

liusida commented 3 months ago

I did a PR, trying to free up vram when idle. @Lucas-BLP . But I don't anticipate that it'll get reviewed soon, given there're so many PRs waiting in the queue. If you can't wait, just take a look at my commit and add these lines. (<10 lines in 1 file)

Have a nice day.

angiopteris commented 3 months ago

Great job sir thank you! I'll have a look : )

angiopteris commented 3 months ago

I did a PR, trying to free up vram when idle. @Lucas-BLP . But I don't anticipate that it'll get reviewed soon, given there're so many PRs waiting in the queue. If you can't wait, just take a look at my commit and add these lines. (<10 lines in 1 file)

Have a nice day.

Actually, your solution is nice because if I understand correctly the queue is empty the VRAM will free up (in anyway models still remains in RAM so the load time will be very fast). Also to build up on your comment, there is an adress /free that is designed for this.

    @routes.post("/free")
    async def post_free(request):
        json_data = await request.json()
        unload_models = json_data.get("unload_models", False)
        free_memory = json_data.get("free_memory", False)
        if unload_models:
            self.prompt_queue.set_flag("unload_models", unload_models)
        if free_memory:
            self.prompt_queue.set_flag("free_memory", free_memory)
        return web.Response(status=200)

liusida commented 3 months ago

Actually, your solution is nice because if I understand correctly the queue is empty the VRAM will free up (in anyway models still remains in RAM so the load time will be very fast).

However, if someone kept the window open and idle and just simply left, VRAM won't be freed because sockets is not empty.

And thanks for the reminder of /free API.

pixelass commented 2 months ago

This worked perfectly for us: https://github.com/comfyanonymous/ComfyUI/pull/3229

Simply call the route /free, to unload models and/or free memory. Pretty sure this issue should be closed as it is already implemented. Closing issues will also help @comfyanonymous to focus on other open issues. ❤️

Thank you for the PR @liusida (despite being closed) wich helped us find the feature in the first place.

        @routes.post("/free")
        async def post_free(request):
            json_data = await request.json()
            unload_models = json_data.get("unload_models", False)
            free_memory = json_data.get("free_memory", False)
            if unload_models:
                self.prompt_queue.set_flag("unload_models", unload_models)
            if free_memory:
                self.prompt_queue.set_flag("free_memory", free_memory)
            return web.Response(status=200)

liusida commented 2 months ago

This worked perfectly for us: #3229

Simply call the route /free, to unload models and/or free memory. Pretty sure this issue should be closed as it is already implemented. Closing issues will also help @comfyanonymous to focus on other open issues. ❤️

Thank you for the PR @liusida (despite being closed) wich helped us find the feature in the first place.
        @routes.post("/free")
        async def post_free(request):
            json_data = await request.json()
            unload_models = json_data.get("unload_models", False)
            free_memory = json_data.get("free_memory", False)
            if unload_models:
                self.prompt_queue.set_flag("unload_models", unload_models)
            if free_memory:
                self.prompt_queue.set_flag("free_memory", free_memory)
            return web.Response(status=200)

Thank you for telling me it helps. It means a lot to me.

Now I am suggesting @comfyanonymous to leave some callback functions in server.py so that this clean-up can be done in a custom node. That'll be a smaller surgery to the main branch. #3429

pixelass commented 2 months ago

@liusida You could just call it already. It doesn't even have to be a node, but rather a web/client-based extension in comfy-ui. All you have to do is add a button and make a post request to the api.

I can try adding it to our client extensions: https://github.com/blib-la/blibla-comfyui-extensions.

But it basically something like this:

Just save it as unload-models-button.js in ComfyUI\web\extensions

import { app } from "../scripts/app.js";
import { $el } from "../scripts/ui.js";

const nodeName = "custom.unloadModels";

const button = $el(
  "button",
  {
    onclick() {
      fetch("/free", {
        method: "POST",
        body: JSON.stringify({ unload_models: true }),
        headers: { "Content-Type": "application/json" },
      }).catch((error) => {
        console.error(error);
      });
    },
  },
  ["Unload Models"]
);

app.registerExtension({
  name: nodeName,
  async init(app) {
    app.ui.menuContainer.append(button);
  },
});

pixelass commented 2 months ago

I just added the feature via our extension: https://github.com/blib-la/blibla-comfyui-extensions?tab=readme-ov-file#unload-models

liusida commented 2 months ago

@pixelass , A cool extension! And pure js solution is very attractive.

Is there a way to automatically call this when the user simply close the browser and leave? People might forget to do the cleaning before leaving, and the VRAM might stay occupied until the next time they come back online.

pixelass commented 2 months ago

Sure, you can listen to window events to detect if a window is active or not, you could track mouse movement and offload after N minutes or so. There are several options. Just ask ChatGPT to help you if you need some inspiration. :)

brodieferguson commented 2 months ago

Ollama unloads after 5 minutes. Now that comfyui is being added to things like open-webui for image generation, it would be nice to have a similar feature to play along with multiple projects and models operating on one system.

comfyanonymous / ComfyUI

VRAM Doesn't unload after a work session (set unload delay to 5m?) #3192