-
### Feature request
hello,our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。
We …
-
### Motivation.
I am one of the authors of the paper Stay On Topic with Classifier-Free Guidance ( https://openreview.net/forum?id=RiM3cl9MdK¬eId=s1BXLL1YZD ) who has been nominated as ICML'24 Spo…
-
I'm trying to run in GKE on an A100 exllamav2, but I'm having trouble getting it to warm up. I'm currently using the old stream generator+speculative decoding with my own modifications and my own serv…
-
### Proposal to improve performance
_No response_
### Report of performance regression
_No response_
### Misc discussion on performance
_No response_
### Your current environment (if you think i…
-
**What would you like to be added**:
Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default.
**Why is this needed**:
Impro…
-
Great work!
I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not …
-
Some apps such as Duolingo requires continuous detection to work properly.
I downloaded an app that does speach to text using android's backend, and it only worked with futo voice in the “Standard”…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 09-23 09:07:16 _custom_ops.py:18] Failed to import from vllm._C with …
-
Hello all,
Thanks for your great work here. We are implementing speculative decoding at mistral.rs, and were in the final stages of testing when we discovered some incredibly strange behavior. Spec…