Questions for some professional vocabularies

Yaxin9Luo / Gamma-MOD

Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models

Other

18 stars 1 forks source link

I'm a newcomer for mixture of depth, and I do think it's really innovative work in MMLM community. However, there are some questions when I read your paper, could you give me some explanation for these professional vocabularies in your paper?

routing ratio: there seems no description for it in your method section, what is the difference between routing ratio and skip ratio?
In table 1, I can't understand why the skip ratio of ARank-based deployment is even higher than all layers. As shown in Figure2, ARank-based deployment only replace some Dense Transformer Layers with MoD Transformer Layers and keep some Dense Transformer Layers unchanged, but all layers replace some Dense Transformer Layers with MoD Transformer Layers. Thus I'm really confused why ARank-based deployment has higher skip ratio than all layers.

Yaxin9Luo / Gamma-MOD

Questions for some professional vocabularies #4