Hi
Thanks for your amazing project, I'm confused about the learning rate setting. I saw in other issue you mentioned "in order to detect the key points accurately, it is necessary to have a large learning rate for the key points", is that mean the learning rate in lmk branch should be larger than classification and bbox regression?
But in your code I found the learning rate is the same in the whole model
def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
warmup_epoch = -1
if epoch <= warmup_epoch:
lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
else:
lr = initial_lr * (gamma ** (step_index))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return lr
and in the Retinaface's mxnet version, I found the author use smaller learning rate in lmk branch
Hi Thanks for your amazing project, I'm confused about the learning rate setting. I saw in other issue you mentioned "in order to detect the key points accurately, it is necessary to have a large learning rate for the key points", is that mean the learning rate in lmk branch should be larger than classification and bbox regression? But in your code I found the learning rate is the same in the whole model
and in the Retinaface's mxnet version, I found the author use smaller learning rate in lmk branch