allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.2k stars 392 forks source link

What is the true MLP ratio for OLMo 7B? #625

Closed jeqcho closed 3 weeks ago

jeqcho commented 3 weeks ago

Hi,

In the paper, a mlp_ratio of ~8/3 is reported for the OLMo 7B model.

paper-table

However, in the configuration file, the d_model is listed as 4096, and the mlp_hidden_size is 22016. This results in a mlp_ratio of 22016/4096=5.375, which significantly differs from the reported 8/3 (approximately 2.67).

Here is the relevant section of the configuration file: https://github.com/allenai/OLMo/blob/ddc884712e991608b69f7f6c04f464d5304f19d3/configs/official/OLMo-7B.yaml#L10-L14

Additionally, it is mentioned here that mlp_hidden_size = mlp_ratio * d_model: https://github.com/allenai/OLMo/blob/ddc884712e991608b69f7f6c04f464d5304f19d3/olmo/config.py#L262-L271

2015aroras commented 3 weeks ago

Our mlp_hidden_size is multiplied by a constant that depends on the activation function:

https://github.com/allenai/OLMo/blob/2417b1176480e3b0d72ac225c558400862fc4c81/olmo/model.py#L451-L456

Our SwiGLU activation uses 0.5, and so our hidden size is halved. As a result, MLP ratio is 22016/4096/2=2.6875≈8/3.

jeqcho commented 3 weeks ago

Thanks for the clarification!