I notice that according to the naming code in the project, your best model's ("experiments/all_0729_sd11_lr0.0001_bs2_ga16/epoch43_trloss0.56_gpt2") should set parameters seed=11, lr=1e-4, batch_size=2 and gradient_accumulation_steps=16.
I am trying to train the model using the command provided in README:
_python train.py -mode train -cfg gpt_path=distilgpt2 lr=1e-4 warmup_steps=2000 gradient_accumulation_steps=16 batch_size=2 epoch_num=60 exp_no=bestmodel
However, I cannot reproduce the best tuned model where I trained model with epoch43 has trloss=0.59 not 0.56. Therefore I am wondering whether there are some parameters that are set differently during training.
Hi,
I notice that according to the naming code in the project, your best model's ("experiments/all_0729_sd11_lr0.0001_bs2_ga16/epoch43_trloss0.56_gpt2") should set parameters seed=11, lr=1e-4, batch_size=2 and gradient_accumulation_steps=16.
I am trying to train the model using the command provided in README: _python train.py -mode train -cfg gpt_path=distilgpt2 lr=1e-4 warmup_steps=2000 gradient_accumulation_steps=16 batch_size=2 epoch_num=60 exp_no=bestmodel
However, I cannot reproduce the best tuned model where I trained model with epoch43 has trloss=0.59 not 0.56. Therefore I am wondering whether there are some parameters that are set differently during training.
Thanks!