Qs on self.act - Githubissues

SafeAILab / EAGLE

Official Implementation of EAGLE

Apache License 2.0

622 stars 59 forks source link

I notice here you have unused self.act, what's the point here?

I try to use the draft model(with only one layer of transformer) to inference, and use the last_hidden from this round's output as next round's input. But find the hidden_states get larger sometimes and cause nan in hidden_states as the auto regressive process going on. Did this happen to you? I'm guessing the self.act is used to avoid hidden_states from exploding, and since in speculative decoding the draft model only needs to decode several tokens, we don't need to worry about such explosion. Looking forward to your thoughts!

SafeAILab / EAGLE

Qs on self.act #63