-
### Motivation
Currently if pass model name as pass to lmdeploy:
```
docker run -d --runtime nvidia --gpus '"device=0"' \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING…
-
### System Info / 系統信息
Package Version
--------------------------------- --------------
absl-py 2.1.0
accelerate 0.33.0
…
-
请问下部署这个模型需要用到哪些版本呢?我部署成功了,但是一调接口就直接500了
使用的版本如下:
sh-4.2$ pip list | grep -P "vllm|torch|cuda"
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-r…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
Make Triton a model-serving component that is *optional* in VDP, users can enable it depending on whether they want to self-host their models on VDP via Triton.
For each model that is deployed via T…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
The latest public Triton basically supports to dispatching the complication flow to the 3p plug-in now.
But there is still some miscellaneous changes required to make XPU backend work.
1. Add XPU …
-
OpenAI upstream commit https://github.com/intel/intel-xpu-backend-for-triton/commit/2dd9d74527f431e5e822b8e67c01900e4d0bfef3 removes `TritonGPUToLLVMBase.h`, we added `Target target` in `ConvertTriton…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…
-
I tried to follow the steps for my Windows PC but I'm facing the following issues:
```
(myenv) PS C:\Users\AI-Install\Documents\transcribe\whisper-diarization> pip install -r requirements.txt
Col…