OpenCSGs llm-inference issues

OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Apache License 2.0

69 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

upgrade vllm to v0.4.1

#143 depenglee1707 closed 5 months ago
0
address apiserver to standalone file

#142 depenglee1707 closed 5 months ago
0
add description for TODO to avoid lost the context

#141 depenglee1707 closed 5 months ago
0
fix ui concurrency setting

#140 depenglee1707 closed 5 months ago
0
upgrade to pydantic v2 -_-

#139 depenglee1707 closed 5 months ago
0
Update opt-125m default autoscaler parameters to help understand for end user

#138 SeanHH86 closed 5 months ago
1
Api server blocked when one request is in-process

#137 SeanHH86 opened 5 months ago
1
Update files of auto scaling on k8s

#136 SeanHH86 closed 5 months ago
0
Serve run in thread

#135 SeanHH86 closed 5 months ago
0
Api server was blocked when LLM deployment scaling config beyond the cluster resouces

#134 SeanHH86 closed 5 months ago
0
Auto load models when api server start

#133 SeanHH86 closed 5 months ago
0
Auto load models from ./models for when api server start

#132 SeanHH86 closed 5 months ago
1
Upgrade ray 2.20.0

#131 SeanHH86 closed 5 months ago
0
Upgrade ray 2.20.0

#130 SeanHH86 closed 5 months ago
1
lock vllm and xformers version to fix conflict

#129 SeanHH86 closed 5 months ago
0
Error happen when do inference for wukong dtype=bfloat16 of use default transformer pipeline load model

#128 SeanHH86 closed 6 months ago
1
Add csg wukong model

#127 SeanHH86 closed 6 months ago
1
Enable tensor paramlism for deepspeed

#126 depenglee1707 closed 6 months ago
0
Add new api in openai style

#125 SeanHH86 closed 6 months ago
0
correct generated metric

#124 depenglee1707 closed 6 months ago
0
GGUF implements will make duplicate copy since cannot detect config.json file in the cache folder

#123 depenglee1707 opened 6 months ago
0
add llama3-8b from csghub

#122 depenglee1707 closed 6 months ago
0
avoid to invoke hf to speed up deployment process

#121 depenglee1707 closed 6 months ago
1
vllm implements cannot support download model from repo besides hg

#120 depenglee1707 opened 6 months ago
2
Load path model issue

#119 depenglee1707 closed 6 months ago
1
the pipeline integration cannot address pad_token/eos_token absent

#118 depenglee1707 closed 6 months ago
0
avoid to ping huggingface when start serving to speed up the deployement

#117 depenglee1707 closed 6 months ago
0
vllm, gguf, llamacpp, these integration cannot address local path of model

#116 depenglee1707 closed 6 months ago
0
support vllm on-fly generate params

#115 depenglee1707 closed 6 months ago
0
update doc for load model from local path

#114 SeanHH86 closed 6 months ago
0
suport on-fly generate params

#113 depenglee1707 closed 6 months ago
1
change default static batch setting

#112 depenglee1707 closed 6 months ago
0
single prompt will failed in streming

#111 depenglee1707 closed 6 months ago
0
simplify readme

#110 depenglee1707 closed 6 months ago
0
recover the text-classification and summarization downstream task sup…

#109 depenglee1707 closed 6 months ago
0
question-answer downstream task not work since the input-output format wrong

#108 depenglee1707 closed 6 months ago
0
fix llamacpp(gguf) broked by "revision"

#107 depenglee1707 closed 6 months ago
0
Refine model config yamls

#106 depenglee1707 closed 6 months ago
0
translation model broken since wrong handling of its output

#105 depenglee1707 closed 6 months ago
0
enable reset generate config on fly

#104 depenglee1707 closed 6 months ago
0
Add inference SDK for invoke

#103 SeanHH86 opened 6 months ago
0
deepspeed cannot work, since the input token not addressed on right device

#102 depenglee1707 closed 6 months ago
0
update quickstart.md to remove evaluate

#101 wanggxa closed 6 months ago
1
The usage introduction of `llm-serve` is not correct in quick_start.md

#100 depenglee1707 closed 6 months ago
0
Requested tokens (817) exceed context window of 512

#99 SeanHH86 opened 6 months ago
3
Model inference cross multi-nodes

#98 SeanHH86 opened 6 months ago
0
API server startup slow

#97 SeanHH86 closed 5 months ago
1
fix llm-serve list

#96 depenglee1707 closed 6 months ago
0
refine cli, make cli self-explanatory

#95 depenglee1707 closed 6 months ago
0
support revision to aviod download latest version of mode

#94 depenglee1707 closed 6 months ago
0