Closed mengxianghan123 closed 10 months ago
Thanks for your question and sorry for the late reply. It's pure engineering evidence. We found that sharing the feature head and cluster head is effective in alleviating the collapse of training on large-scale datasets. However, this also represents that the current training strategy is suboptimal. If you are interested in pushing the performance, I would suggest:
pieta
OK! Thanks for your helpful suggestions!
Thanks for your GREAT WORK!!
But it seems that when training ImageNet, the cluster head's output is never used. In main_efficient.py,
Could you please offer some explanations? Thanks a lot