Open AkshitaB opened 3 weeks ago
Llamaish setup with the following config:
--fused_loss=false --model.init_fn=normal --model.init_std=0.02 --model.init_cutoff_factor=3 --scheduler.warmup_min_lr=0 --scheduler.grad_clip_warmup_steps=null --model.clip_qkv=null --scheduler.units=steps --scheduler.t_warmup=2000
Llamaish setup with the following config: