Open zilangch opened 3 years ago
Hi @zilangch !
In the unsupervised setting, we train the encoder without the action labels and only with the self-supervised learning tasks. And we evaluate the feature representations extracted by the encoder with a supervised trained classifier. This classifier is trained with action labels on the features extracted by the encoder.
The baseline "MS^2L Rand-Unsupervised" means a random encoder. We exploit the encoder without any training to extract features.
The unsupervised approach means that we train the encoder in an unsupervised setting to extract the features and we evaluate the quality of the features by action recognition. However, some other downstream tasks can also be utilized for evaluation, such as action prediction.
Hope this is helpful.
In your first paragraph, "This classifier is trained with action labels on the features extracted by the encoder", is that pseudo label or true label, if it is pseudo label, it is generated by encoder? I don't understand how pseudo label can verify encoder's performance.
Could you give an example with a jigsaw puzzle?
Could unsupervised setting train the encoder and classifier jointly?
It is true labels. We employ the classifier to perform action recognition. Fig. 3 shows an example of jigsaw puzzles in the paper. We shuffle the action sequences in the temporal and apply the network to predict the way we shuffle.
In the unsupervised setting, it requires us to train the encoder without action labels. And we train the classifier with action labels. Thus, we can not train the encoder and classifier jointly in the unsupervised setting.
i read your paper and feel confused about how you calculate the accuracy of unsupervised model and what's exactly about the baseline model "MS^2L Rand-Unsupervised"? Looking forward to your reply