-
OpenAI APIs are quite unstable recently and time out often. Training a prompt requires quite a number of calls which means 99% you will experience a timeout. The current max_time is hard-coded as part…
-
Google released their new open model, Gemma. https://huggingface.co/google/gemma-7b-it
But we can't use it on fastchat with sglang yet:
```
File "/p/haicluster/llama/FastChat/sc_venv_sglang/sg…
surak updated
7 months ago
-
I was trying to create embedding with my existing LLM by using
```
curl http://10.85.13.190:8100/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-2-13b-chat-hf…
-
```
Rank 0: load weight begin.
Rank 0: load weight end.
Rank 0: max_total_num_token=45937, max_prefill_num_token=7656, context_len=4096,
disable_radix_cache=False, enable_flashinfer=False, disabl…
-
Hi,
I'm trying to call the multimodal sglang server using OpenAI compatible API.
This is the code to start the server:
```commandline
python3 -m sglang.launch_server --model-path liuhaotian/ll…
-
## Description
Running `python -m sglang.launch_server` yields this error. I was running it in a container, which I'll describe below.
```
outlines-server-1 | Traceback (most recent call last)…
-
As topic suggests.
Using pydantic -> regex
It causes model to subsequently output blank as the logit mask goes on.
Is there any way to escape " produces by the model in a json field?
-
Hi
Thanks for the great library!
I need to run inference on a ton of sequences and get their log probabilities. I have approximately 100K sequences which can be binned into groups of 100 which share…
-
Hi there, Amazing work on the RadixAttention and json contained decoding. I am running into some unexcited performance issue comparing sglang and vllm. I use latest pip of vllm, and use git-clone-ed s…
-
### System Info / 系統信息
Package Version
--------------------------------- --------------
absl-py 2.1.0
accelerate 0.33.0
…