GPU is not utilized on versions 0.13.0 and above, including latest version

Describe the bug Hello, I've tried to pull latest version of TabbyML (0.17 and 0.18) and they're clearly not using CUDA, I see spikes on CPU load and nothing on my videocards.

Look's like TabbyML versions 0.13.0 and above are not using CUDA on my side, I've found out that latest working version for me is 0.12.0.

It's very simular to bugs https://github.com/TabbyML/tabby/issues/2551 and https://github.com/TabbyML/tabby/issues/2548

Information about your version 0.13.0

Information about your GPU

Sat Oct  5 15:01:26 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.90                 Driver Version: 565.90         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   50C    P8             24W /  290W |    4368MiB /   8192MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Additional context I'm using docker-compose on Windows 11 (tried in WSL too, same behaviour):

  tabby:
    restart: always
    image: tabbyml/tabby:0.13.0
    command: serve --model TabbyML/StarCoder-1B --device cuda
    #command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - ".tabby:/data"
    ports:
      - 8089:8080
   # mem_limit: 8g
#    cpus: '2.0'
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility    
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              driver: nvidia
              count: 1

I've tried standalone exe files, but they're don't use GPU either.

Got something to add, I've checked it on my work PC with the same videocard, and latest version works! That's strange 🤨

Mon Oct  7 15:11:25 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.22                 Driver Version: 552.22         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   49C    P2             78W /  290W |    5680MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

TabbyML / tabby

GPU is not utilized on versions 0.13.0 and above, including latest version #3242