vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

allenai/open-instruct #163

'OOM' if I set --VLLM

Hi, I want to use the vllm during the evaluation. But when I set --vllm, it shows the OOM error. My GPU is A6000 and the model for evaluation is 7B. I can evaluate my model on mt-benchmark with vllm. …

YSLIU627 updated 1 month ago
1
Cornell-RelaxML/quip-sharp #58

Is there a way to support tensor parallelism for inference?…

Chu has merged inference code for models quantized by QuIP# into vllm(https://github.com/chu-tianxiang/vllm-gptq), but now the inference code only supports tensor_parallel_size=1. The reason is "Ha…

ChuanhongLi updated 1 week ago
1
huggingface/text-generation-inference #1862

Encounter install error when install vllm package.

### System Info Target: x86_64-unknown-linux-gnu Cargo version: 1.75.0 Commit sha: N/A Docker label: N/A nvidia-smi: ``` +-----------------------------------------------------------------…

for-just-we updated 4 hours ago
2
EricLBuehler/candle-vllm #44

Support chat serving for more models

Open this issue for tracking the progress of models supported in candle-vllm.

guoqingbao updated 1 day ago
2
intel-analytics/ipex-llm #11424

vLLM freezes with gpu-memory-utilization > 0.55

Running vllm according to instructions. Docker segfaults at startup, so I'm running straight on the machine. Starting server with the following shell script. As you can see I've tried to turn max…

nathanodle updated 1 week ago
3
EleutherAI/lm-evaluation-harness #1963

How to use a vllm hosted model?

Are there docs on best practices for using vllm hosted models? I create a model using python -m vllm.entrypoints.openai.api_server --model model_path and try running it as lm_eval --model lo…

darsh-essential updated 3 weeks ago
1
stanfordnlp/dspy #1242

Failed to parse JSON response from LLM served using vLLM

I want to try DSPy using a local LLM served using vLLM. I followed the instructions from https://dspy-docs.vercel.app/docs/deep-dive/language_model_clients/local_models/HFClientVLLM The model was down…

arpaiva updated 1 day ago
2
InternLM/lmdeploy #1738

[Feature] Speculative Decoding

### Motivation Speculative decoding can speed up generation more than 2x. This degree of speedup is an important feature for a production-grade LM deployment library, and it seems the methods are s…

josephrocca updated 3 weeks ago
5
lobehub/lobe-chat #2267

[Request] 能支持vllm吗？

### 🥰 Feature Description 能支持vllm吗？ ### 🧐 Proposed Solution 能支持vllm吗？ ### 📝 Additional Information _No response_

gaye746560359 updated 1 week ago
1
OpenLLMAI/OpenRLHF #297

Avoid monkey patching vLLM

Currently, vLLM's `vllm.worker.worker.Worker` is replaced with `openrlhf.trainer.ray.vllm_worker_wrap.WorkerWrap` on fly as a monkey patch. The monkey patch is avoidable by making `init_process_gro…

Atry updated 1 month ago
1

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for vllm

1000+ results
for vllm