LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
MIT License
543 stars 84 forks source link

Labeled data has a size of 0 after training a few epochs #49

Closed TqXue closed 1 year ago

TqXue commented 1 year ago

Hi, thanks for the nice implementation!

I am trying to run it on my own dataset, but the labeled data becomes empty after a few epochs, the error as follows:

labeled data has a size of 0 ValueError: num_samples should be a positive integer value, but got num_samples=0

What could i do to solve it? Thanks for your reply!

Hansong-Zhang commented 1 year ago

maybe ur threshold is too high? try to decrease the threshold in GMM model, in this way more instances that are likely to be clean will be reserved as labeled data.

TqXue commented 1 year ago

maybe ur threshold is too high? try to decrease the threshold in GMM model, in this way more instances that are likely to be clean will be reserved as labeled data.

Thanks for your advice! Did you mean the parameter 'tol' in gmm = GaussianMixture(n_components=2, max_iter=10, reg_covar=5e-4, tol=1e-3) ? thanks for your reply!

Hansong-Zhang commented 1 year ago

maybe ur threshold is too high? try to decrease the threshold in GMM model, in this way more instances that are likely to be clean will be reserved as labeled data.

Thanks for your advice! Did you mean the parameter 'tol' in gmm = GaussianMixture(n_components=2, max_iter=10, reg_covar=5e-4, tol=1e-3) ? thanks for your reply!

I'm afraid NOT, I mean the pthreshold in arguments. In the original paper, the threshold is set to be 0.5. The larger the threshold is, the less instances will be deemed as labeled data. Therefore, U should decrease it in order to avoid the size of labeled data becoming 0. I hope this works for you. ^^

TqXue commented 1 year ago

I'm afraid NOT, I mean the pthreshold in arguments. In the original paper, the threshold is set to be 0.5. The larger the threshold is, the less instances will be deemed as labeled data. Therefore, U should decrease it in order to avoid the size of labeled data becoming 0. I hope this works for you. ^^

Thank you for your enthusiastic help! It works!