Some questions about training and evaluation process

DonkeyShot21 / cassle

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

MIT License

117 stars 18 forks source link

Some questions about training and evaluation process #14

Closed AndrewTal closed 1 year ago

AndrewTal commented 1 year ago

Hello,

Thank you for your fantastic project! I have some questions regarding model evaluation. 1）Taking CIFAR10 as an example, if there are 2 tasks, each with 5 classes, is the process shown in the following figure correct?

2）If it is correct, after the self-supervised continual learning part is completed, a 10-class classifier will be trained. When training this 10-class classifier, will all the data from all categories be used simultaneously?

3）Additionally, what is the overall process for Fine-tuning (using Table 2 as an example, Strategy 1 Fine-tuning)? Is it to replace CaSSLe with non-continual learning SSL method?

Thanks！

AndrewTal commented 1 year ago

Also, are the backbone (feature extractor) parameters frozen when training the above classifier? Thanks!

DonkeyShot21 commented 1 year ago

Hello! 1) yes but iirc for forgetting we train a single classifier with 10 classes after each task (the same as for linear evaluation accuracy) and then calculate the accuracy only on the test samples for each task (without masking the classifier, i.e. class-incremental, not task-incremental). You can use the accuracy logged by our code on wandb for each task. 2) yes, this is done with the sole purpose of evaluating the quality of the representations. 3) no, it is the normal fine-tuning use in the whole continual learning literature and it goes more or less as follows: (i) take some data (task1), (ii) train a model on task1 using an SSL method, (iii) discard the data for the current taks and take the data for a new task (task2), (iv) keep training (fine-tune) the model on task2 with the same SSL method, (v) repeat. 4) yes, the backbone is frozen

AndrewTal commented 1 year ago

Thanks for your reply!

Regarding question 1, with respect to forgetting, you mentioned that for each task, the classifier has 10 classes. for each task， may I ask whether you train the classifier using all 10classes labeled data or just using the 5classes labeled data form the current task to train a 10-class classifier?

Regarding question 3, self-supervised learning has been added to the continual learning paradigm, which differs somewhat from the conventional steps. If I understand correctly, is the fine-tuning process like this: [(use task1 unlabeled data to SSL) -> (use task2 unlabeled data to SSL) -> (use 10classes labeled to finetune)]? For the forget calculation of fine-tuning, is it also the same as 1? (Each task train a 10classes classifier individually after all SSL step finish)

DonkeyShot21 commented 1 year ago

1) the linear evaluation is exactly the same as the one performed at the end, therefore, yes, we use all the data for all classes to learn the linear classifier, and then test the classifier on each task. 3) mmh if you want to see it like that, but generally, the self-supervised part does not include the final linear evaluation. Therefore it is more like: CSSL = [(use task1 unlabeled data to SSL) -> (use task2 unlabeled data to SSL)]; and then in order to evaluate the representations we train a linear classifier with all the data (the backbone is frozen, not fine-tuned). And yes, the forgetting calculation is the same for all methods.

AndrewTal commented 1 year ago

all clear, thanks! ^-^