Closed Puvoka closed 2 months ago
For example, 1.1B tinyllama.
Speculative decoding uses a small model to draft for a large model. The 1.1B tinyllama is already a small model. So there is no need to speculatively decode this model.
For example, 1.1B tinyllama.