issues
search
OpenCSGs
/
llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Apache License 2.0
69
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
remove deprecated params: stream
#93
depenglee1707
closed
6 months ago
0
support "revision" in yaml defination
#92
depenglee1707
closed
6 months ago
2
Support streaming in vllm integration
#91
depenglee1707
closed
6 months ago
0
UI not support static batch
#90
depenglee1707
closed
6 months ago
0
fix issue: loading from local folder
#89
depenglee1707
closed
6 months ago
0
fix issue: vllm cannot address runtime_env
#88
depenglee1707
closed
6 months ago
0
vllm cannot address "runtime_env"
#87
depenglee1707
closed
6 months ago
1
Refine description of repo
#86
depenglee1707
closed
6 months ago
0
adopt streaming for ui with text-generation downstream task
#85
depenglee1707
closed
6 months ago
0
fix issue: non-support streaming pipeline cannot work when call it as stream
#84
depenglee1707
closed
6 months ago
0
enhance llamacpp integration to share soma logic between streaming and predict
#83
depenglee1707
closed
6 months ago
1
Refactor streaming
#82
depenglee1707
closed
6 months ago
0
Fix prompt is not string bug
#81
SeanHH86
closed
6 months ago
1
fix issue: stream generation is slow
#80
depenglee1707
closed
6 months ago
0
enhance name of router for comparation scenario
#79
depenglee1707
closed
6 months ago
0
Fix path params issue, make interface consistent
#78
depenglee1707
closed
6 months ago
0
update log
#77
SeanHH86
closed
6 months ago
0
Updata logs
#76
SeanHH86
closed
6 months ago
0
Fix stream without prompt format
#75
SeanHH86
closed
6 months ago
0
fix generate bug for stream api of llamacpp
#74
SeanHH86
closed
6 months ago
0
correct vllm version
#73
depenglee1707
closed
6 months ago
0
Failed to load qwen1_5-72b-chat-q5_k_m.gguf
#72
SeanHH86
closed
6 months ago
3
add Qwen1.5-72B-GGUF yaml and fix load json input error
#71
SeanHH86
closed
6 months ago
1
Make scale out policy consistent between deployments
#70
depenglee1707
closed
6 months ago
0
keep removing deprecated stuff
#69
depenglee1707
closed
6 months ago
0
Support load Qwen1.5-72B-Chat-GPTQ-Int4 by auto_gptq
#68
SeanHH86
opened
6 months ago
1
Model streaming API enhancement
#67
SeanHH86
closed
6 months ago
2
add streaming API support
#66
SeanHH86
closed
6 months ago
0
Enable chat template applied for vllm integration
#65
depenglee1707
closed
7 months ago
0
update Qwen1.5-72B yaml
#64
SeanHH86
closed
7 months ago
0
Fix json format issue for "transformerpipeline"
#63
depenglee1707
closed
7 months ago
0
fix load json data with '\n' failed
#62
SeanHH86
closed
7 months ago
0
Remove the original implements for vllm integration
#61
depenglee1707
closed
7 months ago
0
Refactor the solution of vllm integration
#60
depenglee1707
closed
7 months ago
0
Install dependency llama-cpp-python failed
#59
SeanHH86
opened
7 months ago
4
remove useless stuff
#58
depenglee1707
closed
7 months ago
0
enable prompt template for gguf format inference
#57
depenglee1707
closed
7 months ago
0
Update ray to 2.9.3
#56
SeanHH86
closed
7 months ago
0
Expose model generate parameters by API server
#55
SeanHH86
opened
7 months ago
0
Enable chat template for huggingface transformer
#54
depenglee1707
closed
7 months ago
1
Generate incorrect text format when use pipeline defaulttransformers
#53
SeanHH86
closed
7 months ago
2
Enhance inference API to support OpenAI style
#52
SeanHH86
closed
5 months ago
3
enable "use_bettertransformer" and "torch_compile"
#51
depenglee1707
closed
7 months ago
0
fix output issue 4 ui
#50
depenglee1707
closed
7 months ago
0
fix max-token conflict w/ DS
#49
depenglee1707
closed
7 months ago
0
No default value for "timeout" if missing "batch_wait_timeout_s: 0" in yaml config
#48
depenglee1707
opened
7 months ago
1
enable deepspeed inference
#47
depenglee1707
closed
7 months ago
0
add parameter for timeout
#46
SeanHH86
closed
7 months ago
0
Inference throw timeout sometime
#45
SeanHH86
closed
7 months ago
1
Add Qwen1.5-72B-chat
#44
SeanHH86
closed
7 months ago
0
Previous
Next