Closed shunyuzh closed 2 years ago
Hi @gizacard ,
Thanks for your awesome project. And I just want to know the hyperparameters of finetuning T5-basa.
You have only shared the T5-large's hyper in the tutorial as followings, could you share T5-base's as the former's ?
python train_reader.py \ --use_checkpoint \ --lr 0.00005 \ --optim adamw \ --scheduler linear \ --weight_decay 0.01 \ --text_maxlength 250 \ --per_gpu_batch_size 1 \ --n_context 100 \ --total_step 15000 \ --warmup_step 1000 \
Thanks, looking forward to your reply.
Hi, we used a learning rate equal to 1e-4 for the base model, the rest should be similar.
Hi @gizacard ,
Thanks for your awesome project. And I just want to know the hyperparameters of finetuning T5-basa.
You have only shared the T5-large's hyper in the tutorial as followings, could you share T5-base's as the former's ?
Thanks, looking forward to your reply.