Closed Tech-Arch1tect closed 3 days ago
Hi, the only suspect PR between rc.2 to rc.3 is https://github.com/TabbyML/tabby/pull/2507 - could you take a look and see if you can find anything suspect?
Besides, you might also wanna try https://tabby.tabbyml.com/docs/quick-start/installation/linux to see if it can reveal more information in your setup.
I am experiencing the same behaviour. Also, I have tried to revert the change of mentioned https://github.com/TabbyML/tabby/pull/2507 by restore LD_LIBRARY_PATH.
To do that, I added to the docker run
command env setting -e LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
. Now the GPU utilizing.
However, the chat response always indefinitely prints like 'GGGGGGG' until 'Stop generating' button pressed. Don't know if it is related to the LD_LIBRARY_PATH.
Hi, thanks for getting back to so quick. I have cloned tabby (and init'd the submodules), added the following to my docker compose:
build:
context: ./tabby
dockerfile: docker/Dockerfile.cuda
args:
RUST_TOOLCHAIN: 1.76.0
docker compose up --build, confirmed the issue was present / GPU unused.
I then reverted the commit git revert 15e2e34441f28180fbd6ea231884c8bc64ba8ff7
& docker compose up --build
again and the issue has gone. My GPU is being used after reverting the commit.
Anything I can do to help debug why this commit breaks GPU usage?
However, the chat response always indefinitely prints like 'GGGGGGG' until 'Stop generating' button pressed. Don't know if it is related to the LD_LIBRARY_PATH.
I see the same issue with chat.
please see https://github.com/TabbyML/tabby/issues/2550#issuecomment-2198246852 for discussion of Qwen2 compatibility.
I'm experiencing the opposite behavior. Tabby is utilizing my GPU even when I start the server without the parameter --device cuda
$ ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 546.30 CUDA Version: 12.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 48C P0 53W / 80W | 4675MiB / 8188MiB | 75% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11097 C /llama-server N/A |
| 0 N/A N/A 11128 C /llama-server N/A |
| 0 N/A N/A 11149 C /llama-server N/A |
+-----------------------------------------------------------------------------+
It's not working on Docker either. #2551
Hi @Srkl - this is actually a bug, fixed in https://github.com/TabbyML/tabby/pull/2552
Describe the bug
After updating to version 0.13.0 tabby starts normally however code completion is slow. Digging into it I can see Tabby is no longer utilising GPU, instead using CPU. (evidenced by CPU usage and lack of GPU usage - including lack of GPU vram usage)
Information about your version
Information about your GPU
Additional context
I run TabbyML via docker compose in WSL on Windows. I went through the specific docker tags to find where this problem started:
0.12.0: works as expected 0.13.0-rc.1: works as expected 0.13.0-rc.2: works as expected 0.13.0-rc3: GPU NOT utilised 0.13.0: GPU NOT utilised
There are no errors reported by tabby:
Re-producible docker compose yml:
If I can provide any other info, or perform any troubleshooting that would help identify the issue please let me know.