Justherozen / ProMix

[IJCAI 2023] ProMix: Combating Label Noise via Maximizing Clean Sample Utility
MIT License
80 stars 13 forks source link

About inference and train #13

Closed nengwp closed 2 months ago

nengwp commented 3 months ago

Because ProMix trains two equivalent networks, there are a few minor issues to consider when it comes to inference:

a) Considering the balance between inference speed and performance, should I take the mean of the two networks or just one of them?

b) Is there a robustness issue with the network trained using weak data augmentation?

Additionally, regarding the training process, there are a few minor concerns:

c) What impact would using two networks with different architectures have on the algorithm?

d) How should difficult samples (correctly labeled but with larger loss and mispredictions) in the training set be further handled?

I appreciate your help.

nengwp commented 2 months ago

@Justherozen

Justherozen commented 2 months ago

Thank you for your question, and apologies for the late reply. (a) In our experiments, we found that using the mean output of the two peer networks as an ensemble achieves slightly better results. While using merely single network also achieves satisfactory performance. The additional inference latency brought by the peer networks is not significant during the test phase, and the majority of the additioinal computation cost is incurred during the training stage." (b) We did not observe any robustness issues from using weak augmentation, which is also a common technique in settings of traditional weakly supervised learning. (c) We have not conducted such experiments, but I speculate that using two networks with different architectures but similar performance levels would not have any adverse effects, as our framework is designed to fit different network structures (while different network architectures might offer more opportunities for fitting and co-teaching.). (d) The issue of difficult sample selection has always been a challenge in noisy label learning. In our work, we adopt a progressive selection strategy to dynamically expand the selected clean subset. Our intention is to gradually filter more of these difficult samples into the clean subset during the later stages of training.