-
Nice project!
I believe this project can greatly benefit from https://github.com/sgl-project/sglang. You can try to use SGLang as a backend for local models.
- The fast JSON decoding [feature](h…
-
### Proposal to improve performance
vLLm is under performing in comparison with sglang. There is something which need optimization for better performance.
### Report of performance regression
https…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue y…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related iss…
-
I've been investigating a performance issue with SGLang on RunPod's serverless platform. Here are my key findings:
I identified that SGLang performs significantly worse on the serverless setup comp…
-
Hi,
There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with `HF.model` before. Also the dspy AI tool is broken and no…
-
版本:evalscope 0.5.3
sglang 0.3.0
在本地起了一个sglang的openai api server,命令如下:
CUDA_VISIBLE_DEVICES=4,5,6,7 python -m sglang.launch_server --model-path /local/models/Qwen2-72B-Instruct --tp 4 …
-
We would like to integrate the [cascade attention kernel](https://flashinfer.ai/2024/02/02/cascade-inference.html) from flashinfer.
Code pointers:
- Attention backend in sglang: https://github.com…
-
The [announcement blog post](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/) indicates inference can be done with sglang, but attempting to load the 7b model with the sglang backend:
…