Closed newhand555 closed 2 years ago
Hi, thanks for the interest in our paper. We use classification accuracy merely as a proxy for evaluating the quality of the representations. The reported accuracy is the linear evaluation accuracy at the end of the continual learning sequence. To calculate forgetting we also train a linear classifier after each task. Note that, we operate in the class incremental setting, so we train only one linear classifier for all classes. For the semi-supervised experiments, we do exactly the same but just 10% of the data is used for linear evaluation (see self-supervised learning papers).
Thanks for your reply.
If I make mistakes, please correct me. The model has two types of classifiers.
Therefore, the encoder is trained with continual learning, but the linear classifier is not (it is trained with supervised learning). Is that correct understand?
Looking forward to your kind reply.
Thank you very much.
Yes, I think your description is correct. Just note again that as we operate in the class-incremental setting we train one linear classifier per task, not a set of linear classifiers per task as you suggested. Then to calculate forgetting we just forward the validation data for each task separately to get the accuracy for each task.
Yes, the representations are learned continually and self-supervised as the name of the setting implies (CSSL: Continual Self-Supervised Learning). For evaluation, the linear classifier and the other downstream tasks are performed with supervision, although you could also perform clustering on top of the representations or whatever you like.
Hello,
I have read your paper. It is very impressive. I got a question for class incremental setting and am wondering to know if you can answer.
Did you train the classifier for each task only in the embedding training process? Or did you re-train all classifiers after all task embedding training processes finish? I see that the embedding of the previous task may change after the next task is trained. How does the old classifier trained by the old embedding format take this changed embedding? Your paper mentioned "a subset, e.g., 10% of the data". Did this mean using 10% of data to retrain the classifier at the very end?
Looking forward to your kind reply.
Thanks.