Closed RGring closed 3 years ago
Hi @RGring
The embedding
(B x 128) tensor corresponds to normalized vectors in the feature space for all views of the batch. This tensor is only used to fill in the queue and is therefore not used for gradient computation.
Hope that helps
True my bad. I was confusing the order of both!
Hi Mathilde, In your swav paper, I understand that the backbone as well as the prototypes are updated.
Therefore, I was wondering why you call embeddings.detach() (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L291) in your script. I thought when detaching a tensor, no gradient will be back-propagated along this variable.
Thanks in advance for your help!