dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
117 stars 8 forks source link

Code for Llama-7b and Mistral #11

Closed DRXD1000 closed 5 months ago

DRXD1000 commented 6 months ago

Hey guys, amazing work!

is there an easy way to update your code to use it on mistral or llama-2-7b-chat-hf?

I tried llama-2 but it did not work.

Thanks!

junzhang-zj commented 6 months ago

The current code is adapted to llama-2-7b-chat-hf. You should first generate the corresponding draft model based on search.ipynb, and then use evaluate_sum.ipynb to evaluate it.