dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
131 stars 8 forks source link

Questions about modeling_llama.py #10

Closed qiyuangong closed 8 months ago

qiyuangong commented 9 months ago

Nice repo and paper!

Self-speculative decoding implementation is quite clear and straightforward compared to the Hugging Face Assistant model implementation. :)

I made some tests on 1K input. The result seems promising.

But, I have a few questions about modeling_llama.py.

  1. Will these changes be upstream to HF transformers? Such that we can apply it to other models, e.g., Qwen, YI, etc.
  2. Found some code not necessary for inference. I don't know if these codes are designed for future work.
    • bitfit_linear_forward is not used.
    • hidden_states.requires_grad_(True) seems unnecessary for inference.
    • draft_attn_skip_mask and draft_mlp_skip_mask are not used.
  3. This line seems not correct (https://github.com/dilab-zju/self-speculative-decoding/blob/main/modeling_llama.py#L379 https://github.com/huggingface/transformers/blob/v4.33.1/src/transformers/models/llama/modeling_llama.py#L697).
LorrinWWW commented 8 months ago

Thank you for your interests in our work!

  1. At present, we have no plans to integrate this repository into transformers. We envision this repository more as a tool for studying speculative decoding rather than a ready-to-deploy framework. However, we are open to contributions from those interested in re-implementing, testing, or applying this concept across a wider range of models. The custom modeling code primarily allows for the skipping of intermediate layers, making it relatively straightforward to adapt to other models.
  2. Yeah, these are not used :) We tried bitfit internally but observed some slowdown so we did not go further. hidden_states.requires_grad_(True) is not necessary for inference.
  3. Thank you for reporting. It does look like something off. @junzhang-zj Can you confirm this and update the code?
junzhang-zj commented 8 months ago

Thank you for your report, I have updated the code @qiyuangong.

qiyuangong commented 8 months ago

Thank you for your report, I have updated the code @qiyuangong.

You are welcome! :)