facebookresearch / FiD

Fusion-in-Decoder
Other
536 stars 107 forks source link

About the hyperparameters of finetuning t5-base #11

Closed shunyuzh closed 2 years ago

shunyuzh commented 2 years ago

Hi @gizacard ,

Thanks for your awesome project. And I just want to know the hyperparameters of finetuning T5-basa.

You have only shared the T5-large's hyper in the tutorial as followings, could you share T5-base's as the former's ?

python train_reader.py \
        --use_checkpoint \
        --lr 0.00005 \
        --optim adamw \
        --scheduler linear \
        --weight_decay 0.01 \
        --text_maxlength 250 \
        --per_gpu_batch_size 1 \
        --n_context 100 \
        --total_step 15000 \
        --warmup_step 1000 \

Thanks, looking forward to your reply.

gizacard commented 2 years ago

Hi, we used a learning rate equal to 1e-4 for the base model, the rest should be similar.