-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing…
-
Using `flashinfer` in `sglang` with `google/gemma-7b-it`
```sh
File "/home/ubuntu/sglang-venv/lib/python3.11/site-packages/flashinfer/prefill.py", line 462, in forward
return self._wrapper.…
-
### Your current environment
PyTorch version: 2.3.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.1.1 (arm64)
GCC version: Could not colle…
-
We will implement based on [this](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md).
The idea is as follows, given parsed BNF.
0) While the model is calculating the logits, …
-
### System Info / 系統信息
(Xinference) root@Server-058266b4-896f-4df5-9763-65d3e241d655:~# pip list
Package Version
--------------------------------- ------------
accelerate…
-
### Feature request
Increase support for multi-modal models going forward. Llava 1.6 is one option, but waiting for whatever best model comes out next (IDEFICS 2?) would be fine too.
### Motivation
…
-
sglang is installing the latest vllm, looks like this file was removed from vllm 0.4.0
`ModuleNotFoundError: No module named 'vllm.model_executor.input_metadata`
-
I see the option for `sgl.set_default_backend()` but this seems to be a global setting. Is there a way to get multiple backends running and pick which one is used?
-
### Motivation
Prefix caching is supported in many projects such as vllm, sglang and rtp-llm. Torch engine is going to support this feature in https://github.com/InternLM/lmdeploy/pull/1393. So we ra…
-
I'm running the runtime directly, like so:
```
SGLANG_PORT, additional_ports = handle_port_init(30000, None, 1)
RUNTIME = sgl.Runtime(
model_path=model_path,
port=SGLANG_PORT,
addi…