I saw that you are using two optimizers for Deep Cluster. One for the entire network except the top layer and one for the annexed top layer every cycle.
# create an optimizer for the last fc layer
optimizer_tl = torch.optim.SGD(
model.top_layer.parameters(),
lr=args.lr,
weight_decay=10**args.wd,
)
@mathildecaron31
Wouldn't it be better to use a single optimizer with different param_groups?
As in Pytorch Docs?
Hello there,
I saw that you are using two optimizers for Deep Cluster. One for the entire network except the top layer and one for the annexed top layer every cycle.
@mathildecaron31 Wouldn't it be better to use a single optimizer with different param_groups? As in Pytorch Docs?