allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.55k stars 458 forks source link

Try longer warmup (5k) at the 7B scale with mitch init, normal init and fan-in init #246

Closed dirkgr closed 5 months ago

dirkgr commented 1 year ago

This should be a very easy thing to try. Just one setting in the config.

ananyahjha93 commented 1 year ago

blocked by Jonathan for now, will start as soon as I get the greenlight on Mosaic cluster

ananyahjha93 commented 1 year ago
ananyahjha93 commented 1 year ago

started longer warmup + mitch init: https://wandb.ai/ai2-llm/olmo-medium/runs/p4zz4gid

ananyahjha93 commented 1 year ago

Run seems to have crashed after 1300 steps: restarting from checkpoint

ananyahjha93 commented 1 year ago

mitch init + longer warmup to 10k: https://wandb.ai/ai2-llm/olmo-medium/runs/mdzkub4i

ananyahjha93 commented 1 year ago

normal init + long warmup: https://wandb.ai/ai2-llm/olmo-medium/runs/iuttxpt6

ananyahjha93 commented 1 year ago

fan_in + long warmup: https://wandb.ai/ai2-llm/olmo-medium/runs/zrm8mbw8

dumitrac commented 5 months ago

Marking the items prior to Feb 29th as "closed".