Closed eldarkurtic closed 2 years ago
Hi,
Like mentioned in the paper, we do a simple grid search for some of the training parameters. When you see a few options for a hyperparameter it means we ran experiments for all the possible combinations and picked the best one according to the evaluation set.
For example, when you see learning_rate={1.5e-4, 1.8e-4} and warmup_ratio={0, 0.01, 0.1}
it means we ran all 6 combinations and chose the best one.
Okay, seems like I have misinterpreted them completely. Thanks for the clarification.
Hi, I have a few questions about hyperparams in the Table 6: