vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #5266

[Usage]:how to get the output embedding for a text generatio…

### Your current environment Referring to the issue #5181 "The Offline Inference Embedding Example Fails", the method LLM.encode() can only work for embedding models. Is there any idea to get the ou…

Apricot1225 updated 1 month ago
6
vllm-project/vllm #6947

[Feature]: Add embeddings api for Llama

currently I load openai api server using the command python3 -m vllm.entrypoints.openai.api_server --model Llama3-8B-Instruct --dtype auto --host 0.0.0.0 --port 8051 --gpu-memory-utilization 0.8 --en…

Harsha-Pulagam updated 1 month ago
4
NVIDIA/TensorRT-LLM #1680

[Feature request] Add LogitsProcessor class support in C++ E…

Hi team, I would like to use the LogitsPostProcessor in the [C++ Executor API](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h) to control the generatio…

chiendb97 updated 2 months ago
4
vllm-project/vllm #5496

[Bug]: Qwen/Qwen2-72B-Instruct 128k server down

### Your current environment PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu Jammy Jellyfish (development branch…

junior-zsy updated 1 month ago
10
vllm-project/vllm #5636

Cohere model output speed about 5 tokens per second on vllm.

### Your current environment Collecting environment information... PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubunt…

Rares9999 updated 1 week ago
2
unslothai/unsloth #714

Why is GPU RAM not increasing when increasing batch size?

Hi, I just came from axolotl and I'm impressed! I get 10x faster phi3-mini-4k and I can run the adapter in VLLM (couldn't with axolotl, vLLM says that lm_head module is not supported). **Questi…

adivoj updated 2 months ago
1
vllm-project/vllm #4950

[Installation]: Reduce Image size when installing wheel with…

### Your current environment Hello, when the Python Wheel is installed according to your documentation: https://docs.vllm.ai/en/latest/getting_started/installation.html#install-with-pip The imag…

ch9hn updated 3 months ago
1
PygmalionAI/aphrodite-engine #497

[Bug]: Segmentation fault (core dumped)

### Your current environment ``` (vllm-gptq) root@k8s-master01:/workspace/home/lich/QuIP-for-all# pip3 list | grep aphrodite aphrodite-engine 0.5.3 /workspace/home/lich/aphrodite-eng…

ChuanhongLi updated 4 days ago
1
outlines-dev/outlines #811

Add a cookbook on how to send batch requests with vLLM

Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using `aiohttp`, and we should document this.

rlouf updated 4 months ago
1
rustformers/llm #333

Paged Attention

Just found a recent blog https://vllm.ai/ and repo https://github.com/vllm-project/vllm that implements paged attention. Tested this out and it provides massive throughput and memory efficiency improv…

vikigenius updated 10 months ago
10

上一页 1...88 89 90 91 92 93 94...100 下一页

1000+ results for vllm

1000+ results
for vllm