Heimine / NC_MLab

Neural Collapse in Multi-label Learning with Pick-all-label Loss
https://arxiv.org/abs/2310.15903
2 stars 0 forks source link

The metric for training with fixed ETF classifier. #1

Open Xuyh-s opened 1 month ago

Xuyh-s commented 1 month ago

Hi, I am trying to run your code following README.md. However, I get very poor results (metric for test iou) on MLab-MNIST dataset. Could you please provide more detailed training parameters or training code?

For training with fixed ETF classifier

$ sh submit_param_eff_etf.sh <Network Architecture> <Saved Generated Dataset Path>
Heimine commented 6 days ago

Apologize for the late response, I haven't noticed the GitHub issue until just now...

Yes, please try to change the argument lamb_h to smaller value and it should work.

E.g., change the code inside submit_param_eff_etf.sh to python train_param_eff.py --arch $arch --etf --fix_dim --dataset_root $data_path --epochs 200 --lr 0.1 --lamb_h 0.0

Xuyh-s commented 6 days ago

Apologize for the late response, I haven't noticed the GitHub issue until just now...

Yes, please try to change the argument lamb_h to smaller value and it should work.

E.g., change the code inside submit_param_eff_etf.sh to python train_param_eff.py --arch $arch --etf --fix_dim --dataset_root $data_path --epochs 200 --lr 0.1 --lamb_h 0.0

Thank you for your response! Before you reply, I tried to put . /models/resnet.py at line 285 features = xto features = F.normalize(x) and the situation was effectively improved to the result in your paper. May I ask if features is the result of normalization in your work? In addition, I would like to inquire how the scaled average of multi-label ETF is implemented in your code?

Heimine commented 6 days ago

... the situation was effectively improved to the result in your paper

That's interesting, I believe we haven't tried this in our experiment, but I suspect this normalization works similarly to the effect of changing lamb_h to a smaller value.

May I ask if features is the result of normalization in your work?

We actually used the un-normalized features in all our experiments (both for training and NC calculation).

I would like to inquire how the scaled average of multi-label ETF is implemented in your code?

I'm not sure if I understand this... Are you asking about the calculation of $\mathcal{NC}_m$? If that's the case, the code is implemented in the angle_metric function in utility.py. If that's not the case, could you clarify the question?

Thanks!