issues
search
OpenCSGs
/
llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Apache License 2.0
69
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
upgrade vllm to v0.4.1
#143
depenglee1707
closed
5 months ago
0
address apiserver to standalone file
#142
depenglee1707
closed
5 months ago
0
add description for TODO to avoid lost the context
#141
depenglee1707
closed
5 months ago
0
fix ui concurrency setting
#140
depenglee1707
closed
5 months ago
0
upgrade to pydantic v2 -_-
#139
depenglee1707
closed
5 months ago
0
Update opt-125m default autoscaler parameters to help understand for end user
#138
SeanHH86
closed
5 months ago
1
Api server blocked when one request is in-process
#137
SeanHH86
opened
5 months ago
1
Update files of auto scaling on k8s
#136
SeanHH86
closed
5 months ago
0
Serve run in thread
#135
SeanHH86
closed
5 months ago
0
Api server was blocked when LLM deployment scaling config beyond the cluster resouces
#134
SeanHH86
closed
5 months ago
0
Auto load models when api server start
#133
SeanHH86
closed
5 months ago
0
Auto load models from ./models for when api server start
#132
SeanHH86
closed
5 months ago
1
Upgrade ray 2.20.0
#131
SeanHH86
closed
5 months ago
0
Upgrade ray 2.20.0
#130
SeanHH86
closed
5 months ago
1
lock vllm and xformers version to fix conflict
#129
SeanHH86
closed
5 months ago
0
Error happen when do inference for wukong dtype=bfloat16 of use default transformer pipeline load model
#128
SeanHH86
closed
6 months ago
1
Add csg wukong model
#127
SeanHH86
closed
6 months ago
1
Enable tensor paramlism for deepspeed
#126
depenglee1707
closed
6 months ago
0
Add new api in openai style
#125
SeanHH86
closed
6 months ago
0
correct generated metric
#124
depenglee1707
closed
6 months ago
0
GGUF implements will make duplicate copy since cannot detect config.json file in the cache folder
#123
depenglee1707
opened
6 months ago
0
add llama3-8b from csghub
#122
depenglee1707
closed
6 months ago
0
avoid to invoke hf to speed up deployment process
#121
depenglee1707
closed
6 months ago
1
vllm implements cannot support download model from repo besides hg
#120
depenglee1707
opened
6 months ago
2
Load path model issue
#119
depenglee1707
closed
6 months ago
1
the pipeline integration cannot address pad_token/eos_token absent
#118
depenglee1707
closed
6 months ago
0
avoid to ping huggingface when start serving to speed up the deployement
#117
depenglee1707
closed
6 months ago
0
vllm, gguf, llamacpp, these integration cannot address local path of model
#116
depenglee1707
closed
6 months ago
0
support vllm on-fly generate params
#115
depenglee1707
closed
6 months ago
0
update doc for load model from local path
#114
SeanHH86
closed
6 months ago
0
suport on-fly generate params
#113
depenglee1707
closed
6 months ago
1
change default static batch setting
#112
depenglee1707
closed
6 months ago
0
single prompt will failed in streming
#111
depenglee1707
closed
6 months ago
0
simplify readme
#110
depenglee1707
closed
6 months ago
0
recover the text-classification and summarization downstream task sup…
#109
depenglee1707
closed
6 months ago
0
question-answer downstream task not work since the input-output format wrong
#108
depenglee1707
closed
6 months ago
0
fix llamacpp(gguf) broked by "revision"
#107
depenglee1707
closed
6 months ago
0
Refine model config yamls
#106
depenglee1707
closed
6 months ago
0
translation model broken since wrong handling of its output
#105
depenglee1707
closed
6 months ago
0
enable reset generate config on fly
#104
depenglee1707
closed
6 months ago
0
Add inference SDK for invoke
#103
SeanHH86
opened
6 months ago
0
deepspeed cannot work, since the input token not addressed on right device
#102
depenglee1707
closed
6 months ago
0
update quickstart.md to remove evaluate
#101
wanggxa
closed
6 months ago
1
The usage introduction of `llm-serve` is not correct in quick_start.md
#100
depenglee1707
closed
6 months ago
0
Requested tokens (817) exceed context window of 512
#99
SeanHH86
opened
6 months ago
3
Model inference cross multi-nodes
#98
SeanHH86
opened
6 months ago
0
API server startup slow
#97
SeanHH86
closed
5 months ago
1
fix llm-serve list
#96
depenglee1707
closed
6 months ago
0
refine cli, make cli self-explanatory
#95
depenglee1707
closed
6 months ago
0
support revision to aviod download latest version of mode
#94
depenglee1707
closed
6 months ago
0
Next