vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #4639

[Bug]: with `worker_use_ray = true`, and tensor_parallel_siz…

### Your current environment ```text 2024-05-07 01:43:26 (981 KB/s) - ‘collect_env.py’ saved [24877/24877] Collecting environment information... PyTorch version: 2.2.1+cu121 Is debug build: F…

depenglee1707 updated 1 month ago
12
cipher982/llm-benchmarks #8

Add openvino backend

Greetings, @cipher982! Currently we are working on the Openvino inference framework, and such benchmarks are critical to understand gaps and differences between our framework and Transformers/ TGI …

daniil-lyakhov updated 2 months ago
1
vllm-project/vllm #4303

[Feature]: batched parallel decoding

### 🚀 The feature, motivation and pitch [Parallel/Jacobi decoding](https://arxiv.org/abs/2305.10427) improves inference efficiency by breaking the sequential nature of conventional auto-regressive …

snyhlxde1 updated 1 month ago
7
vllm-project/vllm #3900

[Bug]: AttributeError: 'MergedColumnParallelLinear' object h…

### Your current environment 3MIO:~/vllm$ python collect_env.py Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM u…

guangweiShaw updated 3 weeks ago
8
vllm-project/vllm #6156

[Bug]: When starting deepseek-coder-v2-lite-instruct with vl…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

fengyang95 updated 1 day ago
6
QwenLM/Qwen-VL #409

关于推理预测

请问Qwen modelscope的模型文件与huggingface的模型文件是一致的吗？推理demo Qwen-VL# python web_demo_mm.py 出现以下提示： assert generation_config.chat_format == 'chatml', _ERROR_BAD_CHAT_FORMAT AssertionError: We det…

elesun2018 updated 16 hours ago
12
vllm-project/vllm #2602

Add multi-LoRA support for more architectures

Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures. Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded r…

Yard1 updated 3 weeks ago
6
EmbeddedLLM/vllm-rocm #21

Unable to load models on RX 6800

On my RX 6800 I seem to get `RuntimeError: FlashAttention only supports AMD MI200 GPUs or newer.` for some reason, I Googled that GPU and it seems to be RDNA2 like mine but for enterprise. Is this not…

nonetrix updated 2 months ago
2
vllm-project/vllm #3520

[Bug]: vllm slows down after a long run

### Your current environment ```text Collecting environment information... /data/miniconda3_new/envs/vllm/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORM…

momomobinx updated 3 months ago
2
lm-sys/FastChat #3074

When I call the api '/v1/chat/completions' of API Server, it…

When I call the api '/v1/chat/completions' of API Server to access vllm_worker server , it response incomplete results, but vllm's api response complete results and model_work server response comp…

coreyho updated 4 months ago
5

上一页 1...93 94 95 96 97 98 99...100 下一页

1000+ results for vllm

1000+ results
for vllm