-
有两个问题请教下,
1)一个是如何基于qwen1.5模型启动open-ai的推理服务,同时具备function call的功能?
我看vLLM和SGLang可以启动服务,但是不支持function call,在qwen-agent项目中,没看到具体的文档介绍服务启动。
2)另外一个问题,目前function call的示例中,我发现和openai的调用方式类似,所以想确认下,qwen…
-
The README says
> This version is not compatible with the HF transformer implementation and must be used with SGLang or LLaVA implementation.
I hope to load and run the model with https://github…
zqzqz updated
3 months ago
-
Awesome project. We have a paper https://arxiv.org/abs/2310.14034 with really complicated KV caching that I would love to go back and implement in SGLang.
I tried to get an example working in Cola…
srush updated
3 months ago
-
RadixAttention, a novel technique for automatic KV cache reuse during runtime. Furthermore, RadixAttention is compatible with existing techniques like continuous batching and paged attention.
Blog:…
-
I want to generate the following format, that is, list of jsons:
[
{"name": "Alice", "age": 1},
{"name": "Bob", "age": 2},
]
The number of the objects in the list is random depending on the outpu…
-
hi, thought that this might of interest to the SGLang community too.
https://github.com/outlines-dev/outlines/issues/842
I raised this issue in Outlines, as I see that the `build_regex_from_obje…
-
I notice the classification LLM inference is kind of coarse-grained. Therefore I open this issue to keep updating suggestions.
-
Hi,
sglang support parallelism [link](https://github.com/sgl-project/sglang#parallelism).
Like the example in the link, can I call the API with different sampling parameters in parallel?
for …
-
Building from source following the instructions in readme gives the error below.
```
(build) owu@gpu:/mnt/resource_nvme/sglang/python$ pip install -e "python[all]"
(build) owu@gpu:/mnt/resource_nvm…
owu-1 updated
7 months ago
-
vllm.model_executor.input_metadata is gone in higher versions of vllm. Below is me trying to run with vllm-0.4.0.post1 installed.
```
(build) owu@gpu:/mnt/resource_nvme$ python -m sglang.launch_s…
owu-1 updated
6 months ago