"logits = z" in main_efficient.py

mengxianghan123 commented 11 months ago

Thanks for your GREAT WORK!!

But it seems that when training ImageNet, the cluster head's output is never used. In main_efficient.py,

with autocast(enabled=True):
                z, logits = model(x)
                logits = z
                self_coeff = (logits @ logits.T).abs().unsqueeze(0)

Could you please offer some explanations? Thanks a lot

LeslieTrue commented 10 months ago

Thanks for your question and sorry for the late reply. It's pure engineering evidence. We found that sharing the feature head and cluster head is effective in alleviating the collapse of training on large-scale datasets. However, this also represents that the current training strategy is suboptimal. If you are interested in pushing the performance, I would suggest:

Modifying the design of the pre-feature layer, clustering layer, and feature layer.
Tuning the parameter pieta
Tuning the number of warmup steps

mengxianghan123 commented 10 months ago

OK! Thanks for your helpful suggestions!

LeslieTrue / CPP

"logits = z" in main_efficient.py #2