How to interpret hyperparams?

eldarkurtic commented 2 years ago

Hi, I have a few questions about hyperparams in the Table 6:

Since there are three models: {BERT-Base, BERT-Large, DistilBERT}, how to interpret learning rate for SQuAD with only two values: {1.5e-4, 1.8e-4}?
I assume that for GLUE {1e-4, 1.2e-4, 1.5e-5} are learning rate values for each model respectively. Is this correct?
Since weight decay row has only two values {0, 0.01}, I assume 0 is for all models on SQuAD and 0.01 is for all models on GLUE?
Since warmup ratio row has three values {0, 0.01, 0.1}, I assume these are for each model respectively, no matter which dataset is used?
Does "Epochs {3, 6, 9}" for GLUE mean BERT-base tuned for 3 epochs, BERT-Large for 6 and DistilBERT for 9 epochs?

ofirzaf commented 2 years ago

Hi,

Like mentioned in the paper, we do a simple grid search for some of the training parameters. When you see a few options for a hyperparameter it means we ran experiments for all the possible combinations and picked the best one according to the evaluation set.

For example, when you see learning_rate={1.5e-4, 1.8e-4} and warmup_ratio={0, 0.01, 0.1} it means we ran all 6 combinations and chose the best one.

eldarkurtic commented 2 years ago

Okay, seems like I have misinterpreted them completely. Thanks for the clarification.

IntelLabs / Model-Compression-Research-Package

How to interpret hyperparams? #6