allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.2k stars 392 forks source link

Normal baselines #618

Open AkshitaB opened 3 weeks ago

AkshitaB commented 3 weeks ago

Llamaish setup with the following config:

--fused_loss=false
--model.init_fn=normal
--model.init_std=0.02
--model.init_cutoff_factor=3
--scheduler.warmup_min_lr=0
--scheduler.grad_clip_warmup_steps=null
--model.clip_qkv=null
--scheduler.units=steps
--scheduler.t_warmup=2000