astramind-ai / Mixture-of-depths

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
129 stars 7 forks source link

Can we combine this method with moe? #3

Closed kostum123 closed 5 months ago

kostum123 commented 6 months ago

I wonder if we can combine a mixture of depths and mixture of experts methods in one model?"

GiacomoLeoneMaria commented 6 months ago

In general, there should be no problem