-
After having successfully deployed some models using this DockerFile:
```
FROM python:3.11
# It's good practice to update pip to ensure we can handle recent package specifications
RUN pip inst…
-
### System Info / 系統信息
CUDA==12.1
transformers == 4.44.2
llama_cpp_python == 0.2.90
vllm == 0.6.1.post2
vllm-flash-attn == 2.6.1
Python==3.10.14
Ubuntu==24.04
### Running Xinference wit…
-
When I turn on flashinfer, it reports the following error:
```
Exception in ModelRpcClient:
Traceback (most recent call last):
File "/home/wst4sgh/playground/sine/.venv/lib/python3.10/site-packa…
-
### Feature Description
Right now the `render` method has the following function signature:
```
/**
* The model name to use. Must be OpenAI SDK compatible. Tools and Functions are only suppor…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC ve…
-
### System Info
- H100
### Who can help?
@kaiy
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examples` fo…
-
### System Info / 系統信息
cuda和系统
![image](https://github.com/user-attachments/assets/b36b052b-6c80-4d15-8b83-b078762466e8)
python环境
```
Package Version
------------------…
-
Using flashinfer 0.0.3 requires one line change #282 but there is a compat issue where same model runs fine on 0.0.2 but under 0.0.3 throws an infinite loop of the following on sglang:
```
Excepti…
-
@qeternity In PR #286, Marlin kernel is merged but when is it actually used?
I have tested a marlin llama2 model (works on vllm) but not on latest sglang tip.
```
Traceback (most recent call l…
-
which version of vllm do you use? and does the vllm use of cudagraph?