Have you tried different output channels for single projection and dense projection? Particularly, you used the same hidden channels and output channels for single mlp and dense mlp in the DenseCLNeck impl. As I know, the projection of instance representation requires a greater number of channels than the projection of dense representation. Treating both of them equally might lose lots of useful information from instance representation. How do you think about this problem? Most instance discrimination methods also design the projector as fc-bn-relu-fc so I wonder why you drop bn in DenseCLNeck? Is it just for simplicity?
Have you tried different output channels for single projection and dense projection? Particularly, you used the same hidden channels and output channels for single mlp and dense mlp in the
DenseCLNeck
impl. As I know, the projection of instance representation requires a greater number of channels than the projection of dense representation. Treating both of them equally might lose lots of useful information from instance representation. How do you think about this problem? Most instance discrimination methods also design the projector as fc-bn-relu-fc so I wonder why you dropbn
inDenseCLNeck
? Is it just for simplicity?