-
* In the evaluation steps, `vLLM` is launched locally to serve the candidate models. The code for doing this is not ideal. We (I) stopped at "it works" and we now need to go back and clean it up.
* …
-
### Your current environment
```text
PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home
GCC vers…
-
As described in the research paper, sage attention is quantize QKV to be dtype of INT8 and conduct gemm in INT8 (with accumulator of dtype float16), so I want to know is it conflict or replicated with…
-
Hi! This is a great job. I have tried using the vLLM deployment model. The vLLM service can be started normally, but the following error occurs when the service is invoked.
openai.BadRequestError: Er…
-
## dependency probelm
install `vllm==0.3.2+cu118` through `pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${…
-
vllm server consistently crashes while processing lm-eval requests:
```
INFO 10-01 09:52:39 engine.py:288] Added request cmpl-270a6c19d13b4fb6aac151b9c8ba44c2-0.
ERROR 10-01 09:52:48 client.py:24…
-
### Your current environment
Python platform: Linux-5.10.213-201.855.amzn2.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU…
-
### Your current environment
```text
The output of `python collect_env.py`
WARNING 10-30 12:11:37 _custom_ops.py:19] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm.…
-
### Your current environment
(current environment is irrelevant because this is a replacement for the nightly build reference)
### How you are installing vllm
```sh
git clone
cd vllm
git checko…
-
### The vllm docker image is
`intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1`
### vLLM start command is
'model="/llm/models/Qwen2-72B-Instruct/"
served_model_name="Qwen2-72B…