SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

Qs on self.act #63

Closed cyLi-Tiger closed 2 months ago

cyLi-Tiger commented 2 months ago

I notice here you have unused self.act, what's the point here?

I try to use the draft model(with only one layer of transformer) to inference, and use the last_hidden from this round's output as next round's input. But find the hidden_states get larger sometimes and cause nan in hidden_states as the auto regressive process going on. Did this happen to you? I'm guessing the self.act is used to avoid hidden_states from exploding, and since in speculative decoding the draft model only needs to decode several tokens, we don't need to worry about such explosion. Looking forward to your thoughts!

Liyuhui-12 commented 2 months ago

Indeed, self.act is not being used; it's not for preventing the divergence of hidden_states. The input to the draft model should be the features of the base model. Long-term error accumulation might lead to hidden_states. However, for the draft model, guessing a few tokens is enough, so this issue shouldn't arise.