LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers
https://arxiv.org/abs/2004.08249
Apache License 2.0
326 stars 20 forks source link

Is the "attention_ratio_change" and "fc_ratio_change" trainable or not? #3

Closed gotobelieve closed 4 years ago

gotobelieve commented 4 years ago

hi, according to the code in the transformer_layer.py, these two variable are trainable parameters, I'm not sure what's reason behind this setting?

LiyuanLucasLiu commented 4 years ago

Any specific reason for making these parameters non-trainable? I believe their computation overheads should be negligible. Typically all models parameters are set to be trainable by default and I'm not sure whether fixing these paramteres would lead to a performance drop.

gotobelieve commented 4 years ago

Thanks, I will try it later.