-
I build from source and the version of flashinfer is 0.1.6
```
pip install -e .
Obtaining file:///home/hutl/api/flashinfer/python
Preparing metadata (setup.py) ... done
Installing collected pac…
-
## Description
Integrate [`sglang`](https://github.com/sgl-project/sglang)
-
**Is your feature request related to a problem? Please describe.**
This is kind of an 'FYI' for @vkehfdl1 from our previous brief coffee chat. You may simply close this issue if you are already full…
-
I try to use preble to deploy a model by sglang, but get an error:
```
$ preble run -port 6666 -model /workspace/LLMs/Qwen2-7B-Instruct
Traceback (most recent call last):
File "/usr/local/bin/…
-
-
-
lmsysorg/sglang:latest
-
## 🐛 Bug
## To Reproduce
Steps to reproduce the behavior:
1. Download the weights for LLama 3.2 1B and 3B from huggingface: https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f16-MLC a…
-
### System Info / 系統信息
python=3.11.10
vllm=0.6.3.post1
transformers=0.6.3.post1
vllm-cpp-python=0.3.1
Ubuntu="18.04.6 LTS (Bionic Beaver)"
gpu=3090 (24g)
### Running Xinference with Docker? /…
-
Nice project!
I believe this project can greatly benefit from https://github.com/sgl-project/sglang. You can try to use SGLang as a backend for local models.
- The fast JSON decoding [feature](h…