Closed jcao-ai closed 3 months ago
According to Model card on huggingface:
DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).
However, when I run
config = AutoConfig.from_pretrained('/models/dbrx-instruct/') print(config.ffn_config)
It shows:
DbrxFFNConfig { "ffn_act_fn": { "name": "silu" }, "ffn_hidden_size": 10752, "moe_jitter_eps": 0, "moe_loss_weight": 0.05, "moe_normalize_expert_weights": 1, "moe_num_experts": 16, "moe_top_k": 4, "transformers_version": "4.38.1", "uniform_expert_assignment": false }
It is somehow misleading and confusing.
Hi @jcao-ai , SiLU is the activation function used inside GLU. GLU (Gated Linear Units) is the FFN structure. You can read more about this in paper. Hope this helps.
According to Model card on huggingface:
However, when I run
It shows:
It is somehow misleading and confusing.