Open EricKani opened 3 years ago
Got it. I found the corresponding note in DefaultOptimizerConstructor class. Because 'head' is substring of 'decode_head' and 'auxiliary_head', the corresponding parameters will be settled by the 'lr_mult=10'.
custom_keys
(dict): Specified parameters-wise settings by keys. If
one of the keys in custom_keys
is a substring of the name of one
parameter, then the setting of the parameter will be specified by
custom_keys[key]
and other setting like bias_lr_mult
etc. will
be ignored. It should be noted that the aforementioned key
is the
longest key that is a substring of the name of the parameter. If there
are multiple matched keys with the same length, then the key with lower
alphabet order will be chosen.
custom_keys[key]
should be a dict and may contain fields lr_mult
and decay_mult
. See Example 2 below.Thanks~
But I found there print nothing when 'recurse=False' (code in DefaultOptimizerConstructor.add_params):
for name, param in module.named_parameters(recurse=False): print(name)
I also print the content of builded optimizer. There are 363 param_groups in it. And the lr is also modified by 'lr_mult=10'. But I don't know why 'lr_mult=10' takes effect with nothing generated from "for name, param in module.named_parameters(recurse=False):".
"paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}"
Hi, thank you for open-source your code firstly. I have a question about the configuration of the optimizer. I found there is "decode_head" in your model, not "head" used in 'custom_keys'. Will 'lr_mult=10' takes effect while we training the model?
Thanks~