A problem why use different optimizer for top layer and other layers

facebookresearch / deepcluster

Deep Clustering for Unsupervised Learning of Visual Features

Other

1.66k stars 324 forks source link

A problem why use different optimizer for top layer and other layers #82

Closed siyuan2018 closed 3 years ago

siyuan2018 commented 3 years ago

Hi Dear authors, when I checked the code, I find it actually use different optimizers for top layer and other layers. I am wondering the reason to do so, thank you!

mathildecaron31 commented 3 years ago

Hi,

Sorry for the delay of my reply.

I used two optimizers for the backbone and the last layer because we are re-initializing the last layer after every epoch. TBH, this is not the most elegant implementation and you could definitely use one unique optimizer (though not with PIC clustering where the number of clusters can change at every different clustering)