Closed siyuan2018 closed 3 years ago
Hi,
Sorry for the delay of my reply.
I used two optimizers for the backbone and the last layer because we are re-initializing the last layer after every epoch. TBH, this is not the most elegant implementation and you could definitely use one unique optimizer (though not with PIC clustering where the number of clusters can change at every different clustering)
Hi Dear authors, when I checked the code, I find it actually use different optimizers for top layer and other layers. I am wondering the reason to do so, thank you!