google-research / simclr

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
https://arxiv.org/abs/2006.10029
Apache License 2.0
4.08k stars 623 forks source link

Loss, entropy, accuracy trends #188

Open slala2121 opened 2 years ago

slala2121 commented 2 years ago

I'm trying to understand the relationships and trends among these quantities.

From some experiments, I find that

  1. as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

  2. for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

Thanks.

chentingpc commented 2 years ago

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

yes, it becomes more certain what positive is as it trains.

for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse).

slala2121 commented 2 years ago

Could you explain how the contrastive accuracy is computed? I could understand the potential for overfitting if it's measured on samples different from the those used to compute the training loss. From the code, it seems that loss and accuracy are computed over similar quantities though.

On Thu, Jan 27, 2022 at 7:50 PM Ting Chen @.***> wrote:

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

yes, it becomes more certain what positive is as it trains.

for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse).

— Reply to this email directly, view it on GitHub https://github.com/google-research/simclr/issues/188#issuecomment-1023860792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN3GCFVW4PRO6KO5XS3EDH3UYIHBBANCNFSM5M7CL5CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

chentingpc commented 2 years ago

It's the prediction accuracy of positive examples among all candidates within the mini-batch.

On Fri, Jan 28, 2022 at 6:51 PM slala2121 @.***> wrote:

Could you explain how the contrastive accuracy is computed? I could understand the potential for overfitting if it's measured on samples different from the those used to compute the training loss. From the code, it seems that loss and accuracy are computed over similar quantities though.

Best,

Sayeri Lala PhD candidate | Electrical Engineering | Princeton University

On Thu, Jan 27, 2022 at 7:50 PM Ting Chen @.***> wrote:

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

yes, it becomes more certain what positive is as it trains.

for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse).

— Reply to this email directly, view it on GitHub < https://github.com/google-research/simclr/issues/188#issuecomment-1023860792 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AN3GCFVW4PRO6KO5XS3EDH3UYIHBBANCNFSM5M7CL5CQ

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/google-research/simclr/issues/188#issuecomment-1024750831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUL3FWOPK32WW4NNRV3UYMTZLANCNFSM5M7CL5CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

slala2121 commented 2 years ago

Okay. Then I'm not sure why overfitting would occur since the accuracy is measured over the same samples as the training dataset.

sagi-ezri commented 1 year ago

It is possible that as the training loss declines, the model becomes more confident in its predictions, which can lead to an increase in the entropy of the output distribution. This can happen because the model assigns higher probabilities to correct predictions and lower probabilities to incorrect predictions, which results in a narrower distribution and higher entropy.

Regarding the second observation, one possibility is that the model is too complex for the small training dataset, and therefore, it fails to generalize well to new examples. In this case, reducing the model's complexity or collecting more training data could potentially improve performance. Another possibility is that the model is underfitting the training data, which can also result in poor accuracy. Underfitting occurs when the model is not complex enough to capture the underlying patterns in the data. In this case, increasing the model's complexity or changing the architecture may be helpful. Finally, it is also possible that the accuracy measure being used is not sensitive enough to detect differences in performance. In such cases, other evaluation metrics such as precision, recall, or F1 score may be more appropriate to use.

I hope this helps clarify these issues.