DonkeyShot21 / cassle

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)
MIT License
115 stars 18 forks source link

How did you train the classifier? #1

Closed newhand555 closed 2 years ago

newhand555 commented 2 years ago

Hello,

I have read your paper. It is very impressive. I got a question for class incremental setting and am wondering to know if you can answer.

Did you train the classifier for each task only in the embedding training process? Or did you re-train all classifiers after all task embedding training processes finish? I see that the embedding of the previous task may change after the next task is trained. How does the old classifier trained by the old embedding format take this changed embedding? Your paper mentioned "a subset, e.g., 10% of the data". Did this mean using 10% of data to retrain the classifier at the very end?

Looking forward to your kind reply.

Thanks.

DonkeyShot21 commented 2 years ago

Hi, thanks for the interest in our paper. We use classification accuracy merely as a proxy for evaluating the quality of the representations. The reported accuracy is the linear evaluation accuracy at the end of the continual learning sequence. To calculate forgetting we also train a linear classifier after each task. Note that, we operate in the class incremental setting, so we train only one linear classifier for all classes. For the semi-supervised experiments, we do exactly the same but just 10% of the data is used for linear evaluation (see self-supervised learning papers).

newhand555 commented 2 years ago

Thanks for your reply.

If I make mistakes, please correct me. The model has two types of classifiers.

  1. To report the accuracy, you trained a linear classifier after the whole encoder's continual learning. This classifier is trained with a frozen trained encoder and all data (or a subset for semi-supervised) from all classes.
  2. To report the forgetting, you train a set of linear classifiers for each task during the continual learning process. These ones will be compared with the type 1 classifier.

Therefore, the encoder is trained with continual learning, but the linear classifier is not (it is trained with supervised learning). Is that correct understand?

Looking forward to your kind reply.

Thank you very much.

DonkeyShot21 commented 2 years ago

Yes, I think your description is correct. Just note again that as we operate in the class-incremental setting we train one linear classifier per task, not a set of linear classifiers per task as you suggested. Then to calculate forgetting we just forward the validation data for each task separately to get the accuracy for each task.

Yes, the representations are learned continually and self-supervised as the name of the setting implies (CSSL: Continual Self-Supervised Learning). For evaluation, the linear classifier and the other downstream tasks are performed with supervision, although you could also perform clustering on top of the representations or whatever you like.