Question about the concat_all_gather function

Haochen-Wang409 / U2PL

[CVPR'22 & IJCV'24] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels & Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

Apache License 2.0

437 stars 61 forks source link

Question about the concat_all_gather function #138

Closed Hugo-cell111 closed 1 year ago

Hugo-cell111 commented 1 year ago

Hi! In your code the function "concat_all_gather" is only used in enqueue_and_dequeue function, and the function is processed on CPU. I have 3 questions: 1) why are the prototypes not gathered by using this function? I think the prototype should remain same cross GPUs; 2) when do we need to use this function? 3) why is the keys detached and transfered to CPU before gathering (code link is here ? Thanks!

Haochen-Wang409 commented 1 year ago

Taking the vanilla cross-entropy loss as an example. When we are computing the objective across GPUs, it is not necessarily to ensure that the segmentation logits remain the same. For the same reason, prototypes are not necessarily the same across GPUs, as are queries and their negative keys.
The queue that stores all negative keys should be the same across GPUs, and thus this function is needed.
All negative keys are expected to be detached from the gradient.