dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
117 stars 8 forks source link

questions about Llama2-70b #13

Closed xinlong-yang closed 4 months ago

xinlong-yang commented 5 months ago

hi, thanks for your great work! In your paper, you said you use 2 A100 GPUs and hf accelerate to evaluate Llama2-70b, I want to know you just use original accelerate or use accelerate + deepspeed? Since your repo has no content about this, so I'm a little confused, thanks for your patience!

junzhang-zj commented 4 months ago

We only used device_map='auto' of accelerate to load model parameters, and did not use it and deepspeed to accelerate.

xinlong-yang commented 4 months ago

ok, fine, i got it, thanks!