PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
606 stars 78 forks source link

feat: ngram prompt lookup decoding #438

Closed AlpinDale closed 3 weeks ago

AlpinDale commented 3 weeks ago

Ngram prompt lookup decoding mode for speculative decoding, to enable draft-model-free speculative decoding.

Example usage:

aphrodite run meta-llama/Llama-2-7b-hf --speculative-model [ngram] --ngram-prompt-lookup-max 5 --ngram-prompt-lookup-min 1 --num-speculative-tokens 5 --use-v2-block-manager