IntelLabs / Model-Compression-Research-Package

A library for researching neural networks compression and acceleration methods.
Apache License 2.0
136 stars 24 forks source link

How to interpret hyperparams? #6

Closed eldarkurtic closed 2 years ago

eldarkurtic commented 2 years ago

Hi, I have a few questions about hyperparams in the Table 6:

  1. Since there are three models: {BERT-Base, BERT-Large, DistilBERT}, how to interpret learning rate for SQuAD with only two values: {1.5e-4, 1.8e-4}?
  2. I assume that for GLUE {1e-4, 1.2e-4, 1.5e-5} are learning rate values for each model respectively. Is this correct?
  3. Since weight decay row has only two values {0, 0.01}, I assume 0 is for all models on SQuAD and 0.01 is for all models on GLUE?
  4. Since warmup ratio row has three values {0, 0.01, 0.1}, I assume these are for each model respectively, no matter which dataset is used?
  5. Does "Epochs {3, 6, 9}" for GLUE mean BERT-base tuned for 3 epochs, BERT-Large for 6 and DistilBERT for 9 epochs?
ofirzaf commented 2 years ago

Hi,

Like mentioned in the paper, we do a simple grid search for some of the training parameters. When you see a few options for a hyperparameter it means we ran experiments for all the possible combinations and picked the best one according to the evaluation set.

For example, when you see learning_rate={1.5e-4, 1.8e-4} and warmup_ratio={0, 0.01, 0.1} it means we ran all 6 combinations and chose the best one.

eldarkurtic commented 2 years ago

Okay, seems like I have misinterpreted them completely. Thanks for the clarification.