speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

predibase/lorax #329

Want Lorax with newer version of TGI

### Feature request hello，our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。 We …

yangelaboy updated 5 months ago
5
vllm-project/vllm #5825

[RFC]: Classifier-Free Guidance

### Motivation. I am one of the authors of the paper Stay On Topic with Classifier-Free Guidance ( https://openreview.net/forum?id=RiM3cl9MdK&noteId=s1BXLL1YZD ) who has been nominated as ICML'24 Spo…

Vermeille updated 2 months ago
2
turboderp/exllamav2 #541

GKE A100 slow inference

I'm trying to run in GKE on an A100 exllamav2, but I'm having trouble getting it to warm up. I'm currently using the old stream generator+speculative decoding with my own modifications and my own serv…

vt404v2 updated 4 days ago
2
vllm-project/vllm #6929

why our performance so low when compare with sglang(https://…

### Proposal to improve performance _No response_ ### Report of performance regression _No response_ ### Misc discussion on performance _No response_ ### Your current environment (if you think i…

lw921014 updated 2 months ago
1
InftyAI/llmaz #59

Support speculative decoding

**What would you like to be added**: Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default. **Why is this needed**: Impro…

kerthcet updated 3 weeks ago
3
huggingface/distil-whisper #3

Compatibility with CTranslate2 / faster-whisper

Great work! I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not …

entn-at updated 10 months ago
8
futo-org/voice-input #71

[Feature request] Add support for continuous mode

Some apps such as Duolingo requires continuous detection to work properly. I downloaded an app that does speach to text using android's backend, and it only worked with futo voice in the “Standard”…

Marc-Pierre-Barbier updated 1 month ago
3
vllm-project/vllm #8978

[Usage]: Serving Llama 3.2 `llama-3-2-11b-vision-instruct` h…

### Your current environment ```text The output of `python collect_env.py` ``` ``` :128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…

rchen19 updated 4 hours ago
4
vllm-project/vllm #8732

[Bug]: TypeError: 'NoneType' object is not subscriptable RP…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... WARNING 09-23 09:07:16 _custom_ops.py:18] Failed to import from vllm._C with …

vikyw89 updated 1 week ago
4
huggingface/candle #2153

`broadcast_as` error when processing multiple tokens at onc…

Hello all, Thanks for your great work here. We are implementing speculative decoding at mistral.rs, and were in the final stages of testing when we discovered some incredibly strange behavior. Spec…

EricLBuehler updated 4 months ago
9

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding