dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
117 stars 8 forks source link

question about the training data for bayesian and model size #9

Closed irasin closed 6 months ago

irasin commented 6 months ago

Great works! I have some questions about the Bayesian optimization and performance on the small model size:

  1. The paper says that you use XSum and CNN/DM data for bayesian. I was wondering if I have a new llama-based fined-tuned model or some other open-source models trained with different source data, e.g. (baichuan/Qwen/chatglm etc.), should I use some specfic data to train for bayesian?
  2. What is the result of 7B size model like llama7B/llama2-7B?

Hope to get answer. Thanks a lot.

irasin commented 6 months ago

Also, I was wondering if I change the dataset for bayesian on the same model, let say XSum data on llama2-13B and some other custom data on llama13B, will the skip-layers be different in this two cases?

junzhang-zj commented 6 months ago

Thank you for your appreciation of our work. For the BO data set settings, if the tasks to be tested are very different, we recommend searching at the task level, so the skip layers of different datasets may be different; We have no plans to experiment on the 7B model yet, but the redundancy of their pairs may be smaller.