Open MalikKillian opened 7 months ago
See an prototype implementation here: https://github.com/TabbyML/tabby/issues/624#issuecomment-1778283444
@wsxiaoys Not sure how I'm supposed to leverage this implementation. It doesn't seem to be the default behavior using the Docker container. As far as I can tell it's not even part of the image. 😞
Please describe the feature you want
I use my computer for running Stable Diffusion (ComfyUI) as well as Tabby. I eventually noticed that ComfyUI would run very slowly when Tabby was also running. It appears that Tabby reserves GPU memory and only releases it when the [Docker] container is stopped (not paused). It's a little inconvenient to have to stop Tabby completely to use my GPU for other things.
Tabby should implement a configurable idle-timeout. After X seconds Tabby will release its memory and only reserve it again when a new request comes in. From simple observation it doesn't seem like it takes more than a few seconds to start a TabbyML container and receive a response so I kind of wonder if there's any purpose at all to keeping memory reserved.
Additional context
Tabby version:
0.9.1
Model:
StarCoder-3B
Docker Desktop (Windows) version
4.19.0 (106363)
Output from
nvidia-smi
while Tabby container is idle (no requests for >60 seconds):Output from
nvidia-smi
while Tabby container is stopped:Please reply with a 👍 if you want this feature.