-
As part of [SGLang Issue #1487](https://github.com/sgl-project/sglang/issues/1487), SGLang plans to move vLLM to optional dependencies and use flashinfer as the main dependency.
I am working on mo…
-
### Proposal to improve performance
_No response_
### Report of performance regression
_No response_
### Misc discussion on performance
_No response_
### Your current environment (if you think i…
-
https://github.com/sgl-project/sglang
-
@wenhuach21 GPTQModel has merged `dynamic` per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is s…
-
vllm updated to use pytorch 2.5 recently, so we can benchmark torchao with torch.compile now (previously blocked by 2.5 update)
1. install most recent vllm: `pip install https://vllm-wheels.s3.us…
-
Hi,
Thank you for sharing your great work.
Is there any plan to release the code for generating LLM-based Complex Reasoning Question-Answer?
It seems there is no code for it.
I really apprecia…
-
### Your current environment
The output of `python collect_env.py`
```text
```text
ollecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build…
-
* the input should be a delta table with specific schema (for idempotency and recomputing inference if you change model, etc)
* the output will be a column called "predictions" (user definable) and a…
-
some duplication in https://github.com/pytorch/ao/blob/378e6a8d6854d77efba45fcb1a4091724e9cfaa9/torchao/_models/llama/generate.py#L215-L267 and https://github.com/pytorch/ao/blob/378e6a8d6854d77efba45…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…