I have some questions and would like to seek your clarification

Dylan-H-Wang / skin-sm3

Official Pytorch implementation of Self-Supervised Multi-Modality Learning for Multi-Label Skin Lesion Classification (SM3). Code will be available upon paper acceptance.

Apache License 2.0

2 stars 0 forks source link

After reading your article, I have some questions and would like to seek your clarification. My questions are as follows.：

Were the SM3 pre-training weights used in your paper obtained through pre-training on the Derm7pt dataset or the ImageNet dataset?
From reading the article, I understand that the SM3 method does not utilize the dataset labels during the entire process but instead employs a self-supervised approach to learn features based solely on the images themselves. Then, during the "fine-tuning" stage, the specific label data is used for the classification task. Is this understanding correct? How is the fine-tuning conducted?
In your article, you mentioned using "Label Projection Heads" to project features extracted from images into a label-specific embedding space to generate pseudo-label features. I don't quite understand how this projection method is implemented, given that specific label data information is not used. How are the pseudo-labels generated through this projection? What is the form of these pseudo-label features, and what information do they contain?

Hi @1DJ1127 ,

Thank you for your interests!

SM3 pre-training weights were obtained by pre-training the model using SM3 on the Derm7pt dataset.
Yes, your understanding is correct. The linear probe protocol is a standard approach to evaluate SSL algorithms by freezing all layers except the final fully connected (FC) layer. For more details, please refer to Section 4.1 in MoCo and Supplementary A in MAE.
The pseudo-labeling approach was inspired by the DeepCluster paper, which demonstrates that unsupervised clustering can generate pseudo-labels to train neural networks in a self-reinforcing manner. In our work, we extend this idea to a multi-label, multi-class setting. Features before the "Label Projection Heads" capture general image information, while the "Label Projection Heads" distill label-specific features.

Kind regards

Dylan-H-Wang / skin-sm3

I have some questions and would like to seek your clarification #1