astramind-ai / Mixture-of-depths

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
129 stars 7 forks source link

Misaligned implementation with paper #5

Closed starsholic closed 6 months ago

starsholic commented 6 months ago

Hi! Here in paper 3.4 eq1, 1714224387010

should be output = processed_tokens + x I wonder why only add selected tokens here https://github.com/astramind-ai/Mixture-of-depths/blob/103aa4b6c211346599cc8b853cc3108bf9cb72d0/MoD/MoD.py#L93

Aafiya-H commented 5 months ago

Hello, Thank you so much for pointing this out. I agree with the issue and the solution you mentioned. Have you proceeded with this solution because I think the implementation still has this issue?