Concerns about feature dimensionality in MoCo self-training

I noticed that during self-training, the output dimensions of MoCo are set to 128, and the InfoNCE loss is calculated based on the 128-dimension features. However, when training the linear head, the fully connected output layer is concatenated with the 2048-dimensional features. In my opinion, if the 128-dimensional data represents latent features, it would be better to concatenate the classification head with the 128-dimensional output instead.

So may I ask what the reason is for using this implementation in the code?

facebookresearch / moco

Concerns about feature dimensionality in MoCo self-training #139