vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

run-llama/llama_index #12955

llamaindex vllm error

### Feature Description from llama_index.core.llms.vllm import VllmServer from llama_index.core.llms import ChatMessage llm = VllmServer(api_url="http://localhost:8000", max_new_tokens=8000, temp…

union-cmd updated 1 week ago
11
Zefan-Cai/PyramidKV #14

Merge into vLLM, is it possible?

Solid idea and Ingenious code implementations, Great work! Have you considered implementing KV Compression operations on KV Cache in the vLLM framework?

PatchouliTIS updated 1 day ago
1
otriscon/llm-structured-output #2

vllm support

Hello, nice work and very helpful! Does this support vllm for fast generation?

merlinarer updated 1 month ago
1
bytedance/flux #10

[BUG] Illegal memory access when fuse_reduction=False

**Describe the bug** I'm hitting an illegal memory access in https://github.com/vllm-project/vllm/pull/5917 when setting fuse_reduction=False in the fused GEMM+ReduceScatter kernel. **To Reproduce…

tlrmchlsmth updated 14 hours ago
1
LLMServe/DistServe #19

codellama34b ttft延迟问题

你好，在最近的测试中，我在A100上测试Llama-13b、7b等模型，对比vllm和distserve, 在满足slo的情况下， distserve性能要优于vllm，但是在测试codellama-34b过程中，当我的输入长度为8192，发现TTFT要高出vllm约3倍左右，请问这个情况是正常的吗？vllm使用tp2, distserve使用prefill tp2, decode tp2。

sitabulaixizawaluduo updated 2 days ago
6
instructlab/instructlab #1595

After update config.yaml, default values is wrong

**Describe the bug** After change configuration in config.yaml. Run 'ilab xxx --help' , the default is not consistent with config.yaml. E.g. change default serve model to mixtral, the help message st…

chudegao updated 17 hours ago
6
OpenGVLab/Ask-Anything #202

videochat2部署问题

videochat2这种多模态有什么办法使用vllm部署，vllm现在好像不支持embedding输入

zhanghang-official updated 1 day ago
1
instructlab/instructlab #1456

Add vLLM functional test in CI

While working on the addition of vLLM https://github.com/instructlab/instructlab/pull/1442, I tried adding func test to the e2e test since the runner has a CUDA GPU. Unfortunately, it does not have en…

leseb updated 9 hours ago
7
vgel/repeng #31

vllm implementation

I'm trying to implement control vector into vllm codebase for mixtral model, but I was wondering where should I add the control vector to the layer. Should it be added before attention, fully connecte…

raywanb updated 1 month ago
4
NVlabs/VILA #53

working with VLLM

I'm wondering if I can get an easier pipeline by loading the awq weights with vllm: ``` from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the Uni…

kousun12 updated 2 days ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for vllm

1000+ results
for vllm