-
Hello,
While using the ELI5 and TriviaQA datasets from the Hugging Face library, I encountered errors related to missing documents that are not present in the corpus. I experienced a similar issue …
-
Hi, using version 0.10.3 and the llama3 tokenizer, with vLLM, I can't seem to constrain to generate emojis.
```
curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--hea…
-
noob here, does this mean no mac support, "AssertionError: vLLM only supports Linux platform (including WSL)."
-
Hi, when I follow the default steps to set up environment:
pip install vllm
it will automaticly install vllm 0.5.0.post1, and transformers>=4.40.0 is required.
When installing SPPO ( transformer…
-
I wonder is quanto-quantized model available using vllm?
-
I'm trying to run/load prometheus on Amazon Sagemaker Stuidio notebooks but keep running into errors.
If I load it using VLLM
`model = VLLM(model="prometheus-eval/prometheus-7b-v2.0")`
`ValueErro…
-
### Feature Description
from llama_index.core.llms.vllm import VllmServer
from llama_index.core.llms import ChatMessage
llm = VllmServer(api_url="http://localhost:8000", max_new_tokens=8000, temp…
-
Hello, nice work and very helpful! Does this support vllm for fast generation?
-
你好,在最近的测试中,我在A100上测试Llama-13b、7b等模型,对比vllm和distserve, 在满足slo的情况下, distserve性能要优于vllm,但是在测试codellama-34b过程中,当我的输入长度为8192,发现TTFT要高出vllm约3倍左右,请问这个情况是正常的吗?vllm使用tp2, distserve使用prefill tp2, decode tp2。
-
While working on the addition of vLLM https://github.com/instructlab/instructlab/pull/1442, I tried adding func test to the e2e test since the runner has a CUDA GPU. Unfortunately, it does not have en…