I am confused about the feature vectors used in contrastive learning. The teacher's feature vectors are simply obtained using the projection head g. Why use the projection head g and the prediction head q to obtain the feature vectors of student? Can we just use only g in the student model?
Dear author,
I am confused about the feature vectors used in contrastive learning. The teacher's feature vectors are simply obtained using the projection head g. Why use the projection head g and the prediction head q to obtain the feature vectors of student? Can we just use only g in the student model?