OpenCSGs llm-inference issues

OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Apache License 2.0

69 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

remove deprecated params: stream

#93 depenglee1707 closed 6 months ago
0
support "revision" in yaml defination

#92 depenglee1707 closed 6 months ago
2
Support streaming in vllm integration

#91 depenglee1707 closed 6 months ago
0
UI not support static batch

#90 depenglee1707 closed 6 months ago
0
fix issue: loading from local folder

#89 depenglee1707 closed 6 months ago
0
fix issue: vllm cannot address runtime_env

#88 depenglee1707 closed 6 months ago
0
vllm cannot address "runtime_env"

#87 depenglee1707 closed 6 months ago
1
Refine description of repo

#86 depenglee1707 closed 6 months ago
0
adopt streaming for ui with text-generation downstream task

#85 depenglee1707 closed 6 months ago
0
fix issue: non-support streaming pipeline cannot work when call it as stream

#84 depenglee1707 closed 6 months ago
0
enhance llamacpp integration to share soma logic between streaming and predict

#83 depenglee1707 closed 6 months ago
1
Refactor streaming

#82 depenglee1707 closed 6 months ago
0
Fix prompt is not string bug

#81 SeanHH86 closed 6 months ago
1
fix issue: stream generation is slow

#80 depenglee1707 closed 6 months ago
0
enhance name of router for comparation scenario

#79 depenglee1707 closed 6 months ago
0
Fix path params issue, make interface consistent

#78 depenglee1707 closed 6 months ago
0
update log

#77 SeanHH86 closed 6 months ago
0
Updata logs

#76 SeanHH86 closed 6 months ago
0
Fix stream without prompt format

#75 SeanHH86 closed 6 months ago
0
fix generate bug for stream api of llamacpp

#74 SeanHH86 closed 6 months ago
0
correct vllm version

#73 depenglee1707 closed 6 months ago
0
Failed to load qwen1_5-72b-chat-q5_k_m.gguf

#72 SeanHH86 closed 6 months ago
3
add Qwen1.5-72B-GGUF yaml and fix load json input error

#71 SeanHH86 closed 6 months ago
1
Make scale out policy consistent between deployments

#70 depenglee1707 closed 6 months ago
0
keep removing deprecated stuff

#69 depenglee1707 closed 6 months ago
0
Support load Qwen1.5-72B-Chat-GPTQ-Int4 by auto_gptq

#68 SeanHH86 opened 6 months ago
1
Model streaming API enhancement

#67 SeanHH86 closed 6 months ago
2
add streaming API support

#66 SeanHH86 closed 6 months ago
0
Enable chat template applied for vllm integration

#65 depenglee1707 closed 7 months ago
0
update Qwen1.5-72B yaml

#64 SeanHH86 closed 7 months ago
0
Fix json format issue for "transformerpipeline"

#63 depenglee1707 closed 7 months ago
0
fix load json data with '\n' failed

#62 SeanHH86 closed 7 months ago
0
Remove the original implements for vllm integration

#61 depenglee1707 closed 7 months ago
0
Refactor the solution of vllm integration

#60 depenglee1707 closed 7 months ago
0
Install dependency llama-cpp-python failed

#59 SeanHH86 opened 7 months ago
4
remove useless stuff

#58 depenglee1707 closed 7 months ago
0
enable prompt template for gguf format inference

#57 depenglee1707 closed 7 months ago
0
Update ray to 2.9.3

#56 SeanHH86 closed 7 months ago
0
Expose model generate parameters by API server

#55 SeanHH86 opened 7 months ago
0
Enable chat template for huggingface transformer

#54 depenglee1707 closed 7 months ago
1
Generate incorrect text format when use pipeline defaulttransformers

#53 SeanHH86 closed 7 months ago
2
Enhance inference API to support OpenAI style

#52 SeanHH86 closed 5 months ago
3
enable "use_bettertransformer" and "torch_compile"

#51 depenglee1707 closed 7 months ago
0
fix output issue 4 ui

#50 depenglee1707 closed 7 months ago
0
fix max-token conflict w/ DS

#49 depenglee1707 closed 7 months ago
0
No default value for "timeout" if missing "batch_wait_timeout_s: 0" in yaml config

#48 depenglee1707 opened 7 months ago
1
enable deepspeed inference

#47 depenglee1707 closed 7 months ago
0
add parameter for timeout

#46 SeanHH86 closed 7 months ago
0
Inference throw timeout sometime

#45 SeanHH86 closed 7 months ago
1
Add Qwen1.5-72B-chat

#44 SeanHH86 closed 7 months ago
0

Previous Next