llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mit-han-lab/qserve #14

support tp

Hi, thanks for the great work! What if I want to support larger model, say, beyonds one gpu card's memory and needs tp. Is there a reason why qserve [doesn't support tp](https://github.com/mit-han-…

cyLi-Tiger updated 3 months ago
2
coiled/etl-github #2

Interrogate data with LLM

People are curious about LLMs. It would be nice if we could go through the lifecycle that we expect other groups with large data corpi to go through. We have Terabytes of github data, the textual na…

mrocklin updated 6 months ago
2
InternLM/MindSearch #140

Local LLM support

Hi, can you please provide a guide or support to use local llm models like Ollama lama3.1 8b or 70b

sarkaramitabh300 updated 1 week ago
7
cncf/tag-runtime #163

Improving GitHub Issue Management for CNCF projects using LL…

Currently, every GitHub project and specially the ones that come under CNCF use independent processes for issue triage, bot replies and so on. At a broad level, the following patterns arise where proj…

rajaskakodkar updated 2 months ago
7
vllm-project/vllm #5302

[Bug]: speculative decoding with max-num-seqs <= 2 * num-spe…

### Your current environment docker with vllm/vllm-openai:v0.4.3 （latest） ### 🐛 Describe the bug python3 -m vllm.entrypoints.openai.api_server --model ./Qwen1.5-72B-Chat/ --max-model-len 2400…

HappyLynn updated 2 months ago
2
vectorch-ai/ScaleLLM #144

ScaleLLM vs vLLM in performance

Is there comparison performance data between ScaleLLM and vLLM

WangErXiao updated 2 months ago
20
inferflow/inferflow #21

Can give a example on huggingface model such as Phi-2,Yi etc…

Good job! Hope to see comparisons with different frameworks on some models, such as throughputs, first token speed, etc.

Arcmoon-Hu updated 7 months ago
4
vllm-project/vllm #7528

[Tracking issue] [Help wanted]: Multi-step scheduling follow…

Co-authored with @SolitaryThinker @Yard1 @rkooo567 We are landing multi-step scheduling (#7000) to amortize scheduling overhead for better ITL and throughput. Since the first version of multi-step…

comaniac updated 21 hours ago
4
intel-analytics/ipex-llm #12081

vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC

### The vllm docker image is `intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1` ### vLLM start command is 'model="/llm/models/Qwen2-72B-Instruct/" served_model_name="Qwen2-72B…

oldmikeyang updated 10 hours ago
1
zhangjun/zhangjun.github.io #32

LLM

# repo链接 https://github.com/THUDM/ChatGLM-6B https://github.com/mymusise/ChatGLM-Tuning https://github.com/LianjiaTech/BELLE ## LLM量化 https://zhuanlan.zhihu.com/p/616969812 - [SmoothQuant](htt…

zhangjun updated 10 months ago
9

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving