vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #2780

v0.3.0 openai.api_server fails for Mixtral-8x7B: FileNotFoun…

## v0.3.0 openai.api_server fails for Mixtral-8x7B: FileNotFoundError ### Description * vLLM v0.3.0 openai.api_server fails for Mixtral-8x7B: FileNotFoundError * vLLM v0.2.7 openai.api_server w…

olaf-beh updated 4 months ago
3
vllm-project/vllm #2370

How to use Splitwise(from microsoft) in vllm?

Microsoft have claimed that ”Splitwise“ is supported in vLLM, see https://www.microsoft.com/en-us/research/blog/splitwise-improves-gpu-usage-by-splitting-llm-inference-phases/ ![image](https://githu…

Alec-Lin updated 3 months ago
10
vllm-project/vllm #3709

[Usage]: Model Qwen2ForCausalLM does not support LoRA, but L…

### Your current environment ```text The output of `python collect_env.py` ``` ### How would you like to use vllm I want to run inference of a [specific model](put link here). I don't know how …

jcxcer updated 3 months ago
2
RLHFlow/Online-RLHF #6

Model evaluation issue

Hi, I am trying to evaluate the model RLHFlow/LLaMA3-iterative-DPO-final with MT Bench. I use the inference environment in ReadME and follow the scripts from https://github.com/lm-sys/FastChat/tree/ma…

matouk98 updated 1 month ago
5
skypilot-org/skypilot #3096

Running Docker on RunPod doesn't work

I used [Skypilot docs](https://skypilot.readthedocs.io/en/latest/examples/docker-containers.html) and [Mistral docs](https://docs.mistral.ai/self-deployment/skypilot/) to create this YAML: ``` res…

okdewit updated 1 month ago
3
deepjavalibrary/djl-serving #1470

Plan to use Attention Sinks?

## Description Do you intend to add [Attention Sinks](https://github.com/huggingface/transformers/commit/633215ba58fe5114d8c8d32e415a04600e010701) streaming as an alternative to the current impleme…

spring1915 updated 5 months ago
1
centerforaisafety/HarmBench #38

Avoiding Ray

Hello! Thank you for releasing this extensive code base! I was wondering is there anyway to avoid ray when running some of the attacks like PAIR on a single node? (Ray is unusable on my end). …

sophie-xhonneux updated 1 month ago
3
mckaywrigley/chatbot-ui #1697

Langchain Support

One feature that would be great is Langchain support for agents or chains. Even if it is a LangServe Remote Runnable it would be awesome to be able to leverage Langchain agents / tools...etc.

nacartwright updated 2 weeks ago
1
EleutherAI/lm-evaluation-harness #1903

OpenaiCompletionsLM invokes the completions API with max_tok…

As per the title, the completions API is invoked with max_tokens = 0, which, if properly interpreted by the server, will cause it not to generate anything (according to the [API documentation](https:/…

chimezie updated 1 month ago
1
neonbjb/tortoise-tts #694

Batch Inference?

So we're having issues inferencing efficiently at scale, and of course we're processing the audio parts one by one as is default for inference, but is there any support for batch inference to speed th…

addytheyoung updated 1 month ago
1

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for vllm

1000+ results
for vllm