Parameters for the best tuned model "experiments/all_0729_sd11_lr0.0001_bs2_ga16/epoch43_trloss0.56_gpt2"

Hi,

I notice that according to the naming code in the project, your best model's ("experiments/all_0729_sd11_lr0.0001_bs2_ga16/epoch43_trloss0.56_gpt2") should set parameters seed=11, lr=1e-4, batch_size=2 and gradient_accumulation_steps=16.

I am trying to train the model using the command provided in README: _python train.py -mode train -cfg gpt_path=distilgpt2 lr=1e-4 warmup_steps=2000 gradient_accumulation_steps=16 batch_size=2 epoch_num=60 exp_no=bestmodel

However, I cannot reproduce the best tuned model where I trained model with epoch43 has trloss=0.59 not 0.56. Therefore I am wondering whether there are some parameters that are set differently during training.

Thanks!

TonyNemo / UBAR-MultiWOZ

Parameters for the best tuned model "experiments/all_0729_sd11_lr0.0001_bs2_ga16/epoch43_trloss0.56_gpt2" #5