TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
18.28k stars 771 forks source link

Release 0.13.0 no longer utilises my GPU (cuda) #2548

Closed Tech-Arch1tect closed 3 days ago

Tech-Arch1tect commented 3 days ago

Describe the bug

After updating to version 0.13.0 tabby starts normally however code completion is slow. Digging into it I can see Tabby is no longer utilising GPU, instead using CPU. (evidenced by CPU usage and lack of GPU usage - including lack of GPU vram usage)

Information about your version

# tabby --version
tabby 0.13.0

Information about your GPU

# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 556.12         CUDA Version: 11.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   44C    P0             31W /  200W |    1318MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Additional context

I run TabbyML via docker compose in WSL on Windows. I went through the specific docker tags to find where this problem started:

0.12.0: works as expected 0.13.0-rc.1: works as expected 0.13.0-rc.2: works as expected 0.13.0-rc3: GPU NOT utilised 0.13.0: GPU NOT utilised

There are no errors reported by tabby:

$ docker compose logs
tabby-1  |
tabby-1  | ████████╗ █████╗ ██████╗ ██████╗ ██╗   ██╗
tabby-1  | ╚══██╔══╝██╔══██╗██╔══██╗██╔══██╗╚██╗ ██╔╝
tabby-1  |    ██║   ███████║██████╔╝██████╔╝ ╚████╔╝
tabby-1  |    ██║   ██╔══██║██╔══██╗██╔══██╗  ╚██╔╝
tabby-1  |    ██║   ██║  ██║██████╔╝██████╔╝   ██║
tabby-1  |    ╚═╝   ╚═╝  ╚═╝╚═════╝ ╚═════╝    ╚═╝
tabby-1  |
tabby-1  | 📄 Version 0.13.0
tabby-1  | 🚀 Listening at 0.0.0.0:8080
tabby-1  |

Re-producible docker compose yml:

$ cat docker-compose.yml
version: '3.5'

services:
  tabby:
    image: tabbyml/tabby:0.13.0
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    ports:
      - 8080:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

If I can provide any other info, or perform any troubleshooting that would help identify the issue please let me know.

wsxiaoys commented 3 days ago

Hi, the only suspect PR between rc.2 to rc.3 is https://github.com/TabbyML/tabby/pull/2507 - could you take a look and see if you can find anything suspect?

Besides, you might also wanna try https://tabby.tabbyml.com/docs/quick-start/installation/linux to see if it can reveal more information in your setup.

ila-embsys commented 3 days ago

I am experiencing the same behaviour. Also, I have tried to revert the change of mentioned https://github.com/TabbyML/tabby/pull/2507 by restore LD_LIBRARY_PATH.

To do that, I added to the docker run command env setting -e LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64". Now the GPU utilizing.

However, the chat response always indefinitely prints like 'GGGGGGG' until 'Stop generating' button pressed. Don't know if it is related to the LD_LIBRARY_PATH.

Tech-Arch1tect commented 3 days ago

Hi, thanks for getting back to so quick. I have cloned tabby (and init'd the submodules), added the following to my docker compose:

build:
      context: ./tabby
      dockerfile: docker/Dockerfile.cuda
      args:
        RUST_TOOLCHAIN: 1.76.0

docker compose up --build, confirmed the issue was present / GPU unused.

I then reverted the commit git revert 15e2e34441f28180fbd6ea231884c8bc64ba8ff7 & docker compose up --build again and the issue has gone. My GPU is being used after reverting the commit.

Anything I can do to help debug why this commit breaks GPU usage?

However, the chat response always indefinitely prints like 'GGGGGGG' until 'Stop generating' button pressed. Don't know if it is related to the LD_LIBRARY_PATH.

I see the same issue with chat.

wsxiaoys commented 3 days ago

please see https://github.com/TabbyML/tabby/issues/2550#issuecomment-2198246852 for discussion of Qwen2 compatibility.

Srkl commented 3 days ago

I'm experiencing the opposite behavior. Tabby is utilizing my GPU even when I start the server without the parameter --device cuda

$ ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 546.30       CUDA Version: 12.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P0    53W /  80W |   4675MiB /  8188MiB |     75%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11097      C   /llama-server                   N/A      |
|    0   N/A  N/A     11128      C   /llama-server                   N/A      |
|    0   N/A  N/A     11149      C   /llama-server                   N/A      |
+-----------------------------------------------------------------------------+
PeterTucker commented 3 days ago

It's not working on Docker either. #2551

wsxiaoys commented 3 days ago

Hi @Srkl - this is actually a bug, fixed in https://github.com/TabbyML/tabby/pull/2552