SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
https://arxiv.org/pdf/2406.16858
Apache License 2.0
780 stars 79 forks source link

train configuration #14

Closed je1lee closed 9 months ago

je1lee commented 9 months ago

train_config={ "lr":args.lr, "bs":args.bs, "gradient_accumulation_steps":args.gradient_accumulation_steps, "datapath":f"{args.tmpdir}", "is_warmup":True, "num_epochs":200, "num_warmup_steps":2000, "total_steps":800000, "p_w":0.1, "v_w":1.0, "head_w":0.1, "num_workers":2, "embeding":True, "act":"No", "data_noise":True, "noise":"uniform", "mean":0.0, "std":0.2, "residual":"true,norm", "max_len":2048, "config_path":args.configpath, "b1":0.9, "b2": 0.95, "grad_clip": 0.5, } I'm trying to retrain the autoregression head with your train code.

Is this train_config used for the every autoregression head in hf_hub?? epochs seems too much for me.. If it's not the exact train config used for training this model(https://huggingface.co/yuhuili/EAGLE-llama2-chat-70B) could you share the train_config used for training yuhuili/EAGLE-llama2-chat-70B??

Liyuhui-12 commented 9 months ago

We did not stop training according to the "num_epochs" parameter, which was arbitrarily set. In reality, we only trained for 20 epochs. Limited by VRAM (as we didn't have A100 80G), we also set "max_len" to 1200. This parameter truncates the training sequences; the larger it is set, the more training data is used, and the better the results. If you have sufficient resources, you can try a larger "max_len". Below are our training configurations for LLaMA2-Chat 70B.

train_config={ "lr":3e-5, "bs":4, "gradient_accumulation_steps":8, "datapath":f"{args.tmpdir}", "is_warmup":True, "num_epochs":200, "num_warmup_steps":2000, "total_steps":800000, "p_w":0.1, "v_w":1.0, "head_w":0.1, "num_workers":2, "embeding":True, "act":"No", "data_noise":True, "noise":"uniform", "mean":0.0, "std":0.2, "residual":"true,norm", "max_len":1200, "config_path":"config.json", "b1":0.9, "b2": 0.95, "grad_clip": 0.5, }