vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7283

[Bug]: Empty prompt kills vllm server (AsyncEngineDeadError:…

### Your current environment ```text python3 collect_env.py Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM use…

shimizust updated 1 month ago
6
InternLM/lmdeploy #1817

[Feature] Option to also use host memory for the KV cache

### Motivation KV cache hit rates are probably the biggest performance impact for me, and I recently read: https://research.character.ai/optimizing-inference/ > To solve this problem, we deve…

josephrocca updated 2 weeks ago
1
casper-hansen/AutoAWQ #507

[Performance degrade]phi-3-medium-128k-instruct after awq qu…

phi-3-medium-128k-instruct was quantized by autoawq the quant-config: > quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } nothing changed in the quantize.py…

Ross-Fan updated 3 months ago
2
triton-lang/triton #2200

feat(examples): implement vllm in triton

Vllm uses paged memory and has kernels that perform the generation part of the causal inference. The computation pattern of generation part - single Q for entire seq len of KV - is very different f…

jon-chuang updated 10 months ago
2
deepset-ai/haystack #8275

OpenAIGenerator uses chat_completions endpoint. Error with m…

**Describe the bug** I'm using the OpenAIGenerator to access a vLLM endpoint on runpod. When using a base model like Mistral v0.3 that has not been instruction tuned and so does not have a chat templ…

Permafacture updated 1 week ago
6
NVIDIA/TensorRT-LLM #965

a compare with vllm 0.2.7

### System Info ubuntu22.04 one Nvidia A800 driver info: 470.141.10 cuda: 12.3 tensorrt: 9.2.0.5 ### Who can help? _No response_ ### Information - [X] The official example scripts - …

white-wolf-tech updated 6 months ago
6
LiveBench/LiveBench #29

How to evaluate other models?

Hi, Appreciate the great work！ If I want to test other model performance, how to do? e.g. To test Llama3 405 B, what data format should I pass to your interface? Thxs!

BeyonderXX updated 1 month ago
1
vllm-project/vllm #7514

[Bug]: error while attempting to bind on address ('0.0.0.0',…

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug Hello, On a container env I …

githebs updated 1 month ago
3
vllm-project/vllm #5723

[RFC]: Add runtime weight update API

### Motivation. In online RL training, vLLM can significantly accelerate the rollout stage. To achieve this, we need weight sync from main training process to vLLM worker process, and then call the e…

lyuqin-scale updated 3 months ago
4
FastEval/FastEval #91

Support quantized model like awq with vllm

vLLM had supported awq quantized model yet. Please add one more params to set --quantization awq

xiechengmude updated 9 months ago
1

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for vllm

1000+ results
for vllm