Open bpwl0121 opened 2 months ago
@bpwl0121 - thank you for the question. The two models (OLMo-7B and OLMo-7B-Twin-2T) are identical, except for differences in hardware and initialization. We showed that hardware isn't the cause of the loss spikes in an experiment.
The reason is the difference in initialization. However, these are all "fast spikes" that recover quickly with no apparent harm.
Please let me know if this answers your question.
but I found both used "mitchell" init method. correct me if I am wrong
@bpwl0121 - that is correct that both models use the "mitchell" initialization method. The difference in initialization that I was referring to is the difference in values that the model parameters received at initialization time - because the "mitchell" method specifies a probability distribution for the parameters, but not the exact parameters.
You can verify that the initial parameters are different in the two models by comparing the checkpoints at step #0. Please let me know if this answers your question
@dumitrac thanks for your explanation,
but I just find the
with
and both for twin and non-twin version is false
so where can I find the right parameter for init as you mention
the checkpoints at step#0
thanks
What do you mean by "find the right parameter for init"? What's the parameter you are missing?
di
I cannot remember well, but how to set different value for "mitchell" initialization method. I think I found the same init method for both training setup
The mitchell
init method uses no other parameters. I guess it uses cutoff_factor
, but you basically never have to touch that one.
❓ The question
hi,
thanks for your awesome open source work! I have question regarding the loss spike during training. Do you know why the spikes occur? and from your wandb board, why do the spikes occur ONLY in the twin version. as far as I know, you just use another hardware? twin version![image](https://github.com/allenai/OLMo/assets/38959389/b2586828-62ab-4da2-a1bf-309974fb8425)
normal version![image](https://github.com/allenai/OLMo/assets/38959389/4da24284-e438-4a40-af3d-a84660e1c3da)