apoorvumang / prompt-lookup-decoding

436 stars 22 forks source link

Draw commo n-grams from large corpus of LLM generated text? #6

Open kinchahoy opened 4 months ago

kinchahoy commented 4 months ago

Hey - I was thinking about doing something with n-grams to speed up speculative decoding and I ran into your repo.

I was wondering if you've explored sourcing n-grams from just a large corpus of text (say wikitext2) or a large body of a specific LLM's output? Basically the idea is to build a ~100 MB lookup table of high likelihood ngrams, say 3 tokens that predict a standard pattern of the next 3-5 tokens. Over time you could refine it to be a lookup table that is worth the speculative decoding cost (i.e. only include entries that work ~20+% of the time). We know that models tend to have phrases they prefer (https://www.reddit.com/r/ChatGPT/comments/16e9l7a/what_are_the_most_common_words_and_phrases_used/)

I'm thinking of exploring something like this (perhaps reusing your work in LLama.cpp) but let me know if you've already looked into it, or think it's unlikely to work.