-
代码如下:
```
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
max_model_len, tp_size = 131072, 1
model_name = "/models/codegeex4-all-9b"
tokenizer = AutoTokenizer.from_pr…
-
I have updated to the latest version and used the “spawn” method,
`export VLLM_WORKER_MULTIPROC_METHOD=spawn`
but the error still persists. Could you please help me?
-
### Your current environment
The output of `python collect_env.py`
```
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
I met following questions:
INFO 09-22 21:48:03 api_server.py:495] vLLM API server version 0.6.1
INFO 09-22 21:48:03 api_server.py:496] args: Namespace(host='0.0.0.0', port=40116, uvicorn_log_level='…
-
### Your current environment
conda nccl v2.21.5.1
### 🐛 Describe the bug
I have 4 GPUs. 3x3090 and 1x2080ti 22g.
I try to load cat llama 70b 5.0bpw exl2 with aphrodite. If I don't disable …
-
Hi, thanks for your awesome work!
I'm trying to implement https://github.com/SafeAILab/EAGLE with high-performance kernels. I read [this blog](https://flashinfer.ai/2024/02/02/introduce-flashinfer.…
-
```
http://static.electroteque.org.s3.amazonaws.com/download/apple-osmf.zip
Here is the refactored code as a library now with a working example of the m3u8
parsing and multi bitrate setup. I'm not s…
-
```
http://static.electroteque.org.s3.amazonaws.com/download/apple-osmf.zip
Here is the refactored code as a library now with a working example of the m3u8
parsing and multi bitrate setup. I'm not s…
-
Qwen2 eagle model now has been uploaded to hf repo. I can't wait to test its performance.
However, it seems that eagle's inference framework doesn't support qwen2, when will it officially be support…