PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.9k stars 121 forks source link

[Question] how to check activate parameters of MoE models? #45

Closed koda-11 closed 6 months ago

koda-11 commented 6 months ago

Question

hello, thanks for nice work!

is there any code for check activate parameters of MoE models?

LinB203 commented 6 months ago

Refer to A.1. More Model Architecture in paper's appendix. We also provide the code to check.

def num_param(vocab_size, hidden_size, num_hidden_layers, intermediate_size, ffn_factor, freq_moe_layer, num_experts):
    num_moe_layers = num_hidden_layers // freq_moe_layer
    num_extra_ffns = num_moe_layers * (num_experts - 1)

    moe_num_params = vocab_size * hidden_size + \
                     num_hidden_layers * (
                             hidden_size * hidden_size * 4 + hidden_size * intermediate_size * ffn_factor + hidden_size * 2) + \
                     hidden_size + hidden_size * vocab_size + \
                     num_extra_ffns * (hidden_size * intermediate_size * ffn_factor + hidden_size * 2) + \
                     num_moe_layers * (hidden_size * num_experts)

    print(f'Number of parameters of MoE Model (B) /w {num_experts} experts: {round(moe_num_params / 1e9, 2)}')
    return round(moe_num_params / 1e9, 1)

# model_qwen_1_8b = dict(vocab_size=151936,
#                 hidden_size=2048,
#                 num_hidden_layers=24,
#                 intermediate_size=5504,
#                 ffn_factor=3,
#                 freq_moe_layer=2)
#
# num_param(**model_qwen_1_8b, num_experts=1)
# num_param(**model_qwen_1_8b, num_experts=2)
# num_param(**model_qwen_1_8b, num_experts=4)
# num_param(**model_qwen_1_8b, num_experts=8)

# print('phi 2.7b')
# model_phi_2_7b = dict(vocab_size=51200,
#                 hidden_size=2560,
#                 num_hidden_layers=32,
#                 intermediate_size=10240,
#                 ffn_factor=2,
#                 freq_moe_layer=2)
#
# num_param(**model_phi_2_7b, num_experts=1)
# num_param(**model_phi_2_7b, num_experts=2)
# num_param(**model_phi_2_7b, num_experts=4)

# print('stablelm 1.6b')
# model_stablelm_1_6b = dict(vocab_size=100352,
#                 hidden_size=2048,
#                 num_hidden_layers=24,
#                 intermediate_size=5632,
#                 ffn_factor=3,
#                 freq_moe_layer=2)
#
# num_param(**model_stablelm_1_6b, num_experts=1)
# num_param(**model_stablelm_1_6b, num_experts=2)
# num_param(**model_stablelm_1_6b, num_experts=4)

# print('llama 7b')
# model_7b = dict(vocab_size=32000,
#                 hidden_size=4096,
#                 num_hidden_layers=32,
#                 intermediate_size=11008,
#                 ffn_factor=3,
#                 freq_moe_layer=2)
koda-11 commented 6 months ago

Thanks