Closed zankner closed 7 months ago
The training configuration is as follows.
train_config={ "lr":3e-5, "bs":4, "gradient_accumulation_steps":1, "is_warmup":True, "num_epochs":200, "num_warmup_steps":2000, "total_steps":800000, "p_w":0.1, "v_w":1.0, "head_w":0.1, "num_workers":2, "embeding":True, "act":"No", "data_noise":True, "noise":"uniform", "mean":0.0, "std":0.2, "residual":"true,norm", "max_len":2048, "config_path":"config.json", "b1":0.9, "b2": 0.95, "grad_clip": 0.5, }
Is the global batch size 128 in the end?
The global batch size is 16.
I'm trying to replicate the training results for the 7B head. Could you share the training config used in main.py please?