I've noticed that this knowledge distillation is somewhat similar to what is mentioned in EAGLE, and it has proven to be very effective. I would like to know if you have tried the knowledge distillation methods mentioned in the paper 'DISTILLSPEC: Improving Speculative Decoding via Knowledge Distillation', such as RKL, TVD, etc.
I've noticed that this knowledge distillation is somewhat similar to what is mentioned in EAGLE, and it has proven to be very effective. I would like to know if you have tried the knowledge distillation methods mentioned in the paper 'DISTILLSPEC: Improving Speculative Decoding via Knowledge Distillation', such as RKL, TVD, etc.