speculative-decoding Search Results

808 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Neural-Dragon-AI/Cynde #7

make some tests and choose an openai api compatible local ll…

https://github.com/ollama/ollama https://github.com/abetlen/llama-cpp-python https://github.com/vllm-project/vllm

furlat updated 3 months ago
4
flashinfer-ai/flashinfer #19

[Roadmap] FlashInfer v0.1.0 release checklist

Expected release date: Mar 15th, 2024 # General 1. [x] Support general page table layout (@yzh119 ) 2. [ ] sm70/75 compatibility (@yzh119 ) 3. [ ] performance: using fp16 as intermediate data ty…

yzh119 updated 3 months ago
5
ArtemisDicoTiar/FastLLM #6

Meeting Note [11/23]

target task: summarization distillation: teacher → student (draft model) t5-xl: target, t5-small: drafter n-gram: ...? Ngram: KD ngram should be trained with the model generated dataset. …

ArtemisDicoTiar updated 7 months ago
1
vllm-project/vllm #5535

[Bug]: Performance : very slow inference for Mixtral 8x7B In…

### Your current environment ```text Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu …

Syst3m1cAn0maly updated 1 week ago
4
ggerganov/llama.cpp #7763

Feature Request: Add vocabulary type for token-free models t…

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…

uwu-420 updated 3 weeks ago
3
triton-inference-server/tensorrtllm_backend #226

llama docs

https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md if possible add speculvative decoding example in llama docs.

MrD005 updated 6 months ago
3
vllm-project/vllm #1970

Lookahead decoding | forking + "appending" to child sequence…

Hi, @WoosukKwon and @zhuohan123 , Fantastic project! I was taking a stab at implementing a version of **greedy** lookahead-decoding. Given some candidate completions, I was trying to: 1. Fork …

priyamtejaswin updated 4 months ago
4
OpenAccess-AI-Collective/axolotl #557

Medusa training and inference

### ⚠️ Please check that this feature request hasn't been suggested before. - [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories…

dongxiaolong updated 8 months ago
1
run-llama/llama_index #13925

[Question]: how to use the llamaindex+vllm correctly?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I install the llamaindex with the command `pip install llama-index` and install t…

lambda7xx updated 4 weeks ago
10
EricLBuehler/mistral.rs #291

Speed up speculative decoding implementation

As described. The speculative decoding implementation is working, but should be sped up.

EricLBuehler updated 1 month ago
1

上一页 1...7 8 9 10 11 12 13...81 下一页

808 results for speculative-decoding

808 results
for speculative-decoding