Closed AlpinDale closed 3 weeks ago
Ngram prompt lookup decoding mode for speculative decoding, to enable draft-model-free speculative decoding.
Example usage:
aphrodite run meta-llama/Llama-2-7b-hf --speculative-model [ngram] --ngram-prompt-lookup-max 5 --ngram-prompt-lookup-min 1 --num-speculative-tokens 5 --use-v2-block-manager
Ngram prompt lookup decoding mode for speculative decoding, to enable draft-model-free speculative decoding.
Example usage: