BERT hyperparameters. Following the common BERT fine-tuning procedure, we keep a dropout
rate of 0.1, and try learning rate of 1e-5, 2e-5 and 5e-5 and batch size of 32 and 128. We also tune the number of steps ranging from 30 to 100k for various data sizes
what is the meaning of the batch size?
is it means the total batch size of sup and unsup, or just means the sup train batch size?
when compare uda and normal training, the hyperparam should use total batch size,or just training batch size? I mean if one param is based on batch size 32, and another one is based one batch size 16, and i use sup bs 16 and unsup bs 16 for uda, which param i should use?
BERT hyperparameters. Following the common BERT fine-tuning procedure, we keep a dropout rate of 0.1, and try learning rate of 1e-5, 2e-5 and 5e-5 and batch size of 32 and 128. We also tune the number of steps ranging from 30 to 100k for various data sizes
what is the meaning of the batch size? is it means the total batch size of sup and unsup, or just means the sup train batch size?
when compare uda and normal training, the hyperparam should use total batch size,or just training batch size? I mean if one param is based on batch size 32, and another one is based one batch size 16, and i use sup bs 16 and unsup bs 16 for uda, which param i should use?