I see that the method encourages augmentation consistency between the projection head and prediction head for the non-parametric KNN.
The design is unusual to me. I thought consistency loss should put between the same part of network's outputs, e.g., after the same projection head or after the same prediction head.
Could the authors share the behind ideas or reasons of such design?
Thanks!
I see that the method encourages augmentation consistency between the projection head and prediction head for the non-parametric KNN. The design is unusual to me. I thought consistency loss should put between the same part of network's outputs, e.g., after the same projection head or after the same prediction head. Could the authors share the behind ideas or reasons of such design? Thanks!