facebookresearch / moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
MIT License
4.82k stars 794 forks source link

Concerns about feature dimensionality in MoCo self-training #139

Open LALBJ opened 1 year ago

LALBJ commented 1 year ago

I noticed that during self-training, the output dimensions of MoCo are set to 128, and the InfoNCE loss is calculated based on the 128-dimension features. However, when training the linear head, the fully connected output layer is concatenated with the 2048-dimensional features. In my opinion, if the 128-dimensional data represents latent features, it would be better to concatenate the classification head with the 128-dimensional output instead.

So may I ask what the reason is for using this implementation in the code?