bnu-wangxun / Deep_Metric

Deep Metric Learning
Apache License 2.0
779 stars 148 forks source link

XBM memory management after detach() operation #45

Closed amaralibey closed 4 years ago

amaralibey commented 4 years ago

I have a question about the algorithm01 in your paper. You mentioned that the embeddings (called anchors) are detached from the graphe before being put into the queue. You know that detached nodes from the graph don't retain the gradient history, hence, the embeddings will be considered constants in the next iteration, making them irrelevant to the loss function. I don't undestand how the gradient can still be calculated on old batches without retaining the gradient history in different stages of the forward pass ?

Thank you.

ZhangHZ9 commented 4 years ago

You're right. We don't calculate the gradient of features from old batches (anchors.transpose()), but only back-propagate the gradient to current features (M.feats).

sim = torch.matmul(anchors.transpose(), M.feats)