Closed lqxisok closed 3 years ago
Sorry for the late reply. feat_mat2 is computed on the mini-batch samples. They provide "live" features as opposed to memory features.
Memory features are not directly updated to minimize the loss while the mini-batch features can be influenced by the loss. In this sense, using feat_mat2 can make a difference.
After carefully checking the training code, I have a little question about the
feat_mat2
inloss_nc
.How does the
feat_mat2
help the performance of neighborhood clustering?I see when computing the
loss_nc
the entropy was fed with the concatenation of three values which areout_t
,feat_mat
andfeat_mat2
. It turns out that theout_t
andfeat_mat
denote the $W$ and $V$ respectively illustrated by the Sec. 3.2. So I guess thefeat_mat2
has the same influence withfeat_mat
. But it only works in the mini batch data, does it? Is there any reasonable explanation about it? Thanks in advance.