vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #4706

[Bug]: use thread after call multiple times. KeyError: reque…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A …

xubzhlin updated 1 month ago
1
ROCm/flash-attention #27

RDNA3 support

Great work so far. I'm trying to run vLLM on my 7900XTX cards and was wondering if there were any plans to support RDNA3?

WilliamGazeley updated 3 months ago
62
vllm-project/vllm #2735

ERROR: Could not build wheels for vllm, which is required to…

(vllm) PS C:\Users\hub1\gen-ai-training-abshek\gen-ai-training\vllm-main> pip install -e . Looking in indexes: http://art.nwie.net/artifactory/api/pypi/pypi/simple Obtaining file:///C:/Users/hub1/ge…

hubingqian2011 updated 2 months ago
3
vllm-project/vllm #3851

[Usage]: Setting max_tokens in chat completion request class…

### Your current environment I am using vllm version 0.3.0 I am using this class `ChatCompletionRequest` to create the request for my chat completion endpoint Whenever I set the `max_tokens` to an…

NrKhader updated 1 month ago
13
unslothai/unsloth #114

Saving to GGUF llama.cpp / merging to 16bit for VLLM

Fully supported! Scroll down on our latest Mistral notebook: https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing For 16bit merging: ``` model.save_pretrained_mer…

danielhanchen updated 3 months ago
7
QwenLM/Qwen2 #210

qwen1.5-14b-chat performance is worse than 1.0

It is not as useful as qwen1 in real use cases and is not a generic benchmark Math skills are no good eg. 去年阿里的营收534785万元，腾讯的是54787万元，这两个公司哪个的营收比较多，高多少 use：vllm 0.3.3

kxleee updated 1 month ago
4
vllm-project/vllm #4112

[Bug]: VLLM's output is unstable when handling requests CONC…

### Your current environment using lastest docker images: command: docker run --runtime nvidia --gpus all -v /mnt_1/models:/models -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model /models/Q…

zhengwei-gao updated 1 month ago
6
outlines-dev/outlines #344

Add Outlines in LangChain

We should open a PR on the LangChain repo to add Outlined as a model / guided generation provider.

rlouf updated 2 months ago
5
Alpha-VLLM/Lumina-T2X #34

What does Next-DiT stands for?

Hi, I'm curious about Next-DiT, it is not mentioned in your paper.

Luciennnnnnn updated 3 weeks ago
2
vllm-project/vllm #6264

[Bug]: Qwen2 Moe FP8 not supported on L40

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

TopIdiot updated 3 days ago
3

上一页 1...89 90 91 92 93 94 95...100 下一页

1000+ results for vllm

1000+ results
for vllm