Open bryanhpchiang opened 1 year ago
The paper itself recommends considering using something that just uses a bigram model or even just looks for the current most recent sequence of words in the prompt and carries on with the longest completion from there as a guess The draft model itself also has a weight-retrieval overhang as well, so you can speculate while you speculate.
If we wanted to do speculative sampling on the 7B Llama model, do you have any recommendations for which (non-Llama) draft model to use? Thanks!