Using pretrained networks and small amounts of labelled data

brocksar commented 3 years ago

Firstly, great and interesting work on PAWS! I have been working on using PAWS for my specific uses and I had a couple questions. Have you ever used a pretrained model as the backbone and used PAWS for further training? Particularly what I'm trying to do is use a backbone trained on imagenet, and then use PAWS to tune the weights on a smaller dataset with very few labels per class. I was wondering if you've ever experimented with a training setup like this, and what you would recommend hyperparameter-wise.

My second question touches on the 'very few labels per class' I just mentioned. In the ablation study in the paper you go down to 4 labels per class. Have you experimented with anything lower? I've been working on training PAWS with one labeled image per class, and was wondering what insights you had on training in this case. I know you have mentioned that increasing the number of classes per support batch is important, but I was wondering if there was anything else.

Thanks so much!

jmarrietar commented 3 years ago

I used a pre-trained ResNet Imagenet backbone and start training using PAWS (Unlabeled + Labeled) with my domain images (medical domain) and it works, although For the Hyperparameters I only change the learning rate and the number of classes. I have the same question regarding Hyperparameter-wise because although it is learning, I am having better results pre-training using the pre-train SimCLR approach.

Any directions or thoughts in this regard @MidoAssran ?

Thank you very much!.

brocksar commented 3 years ago

I haven't tried using the SimCLR semi-supervised loss, but I have noticed that I get better results when I just do supervised fine-tuning and no semi-supervised training, which is very bizarre.

How much did you lower the learning rate @jmarrietar ?

jmarrietar commented 3 years ago

@brocksar My current configuration is final_lr: 0.00001 lr: 0.001 .

Are you doing the posterior fine-tuning with the labeled data (as stated in the paper) and the results are worse?.

brocksar commented 3 years ago

@jmarrietar Okay, I've been using similar learning rates. And yes, I have been doing the semi-supervised PAWS training on the pretrained backbone + the projection head, then replacing the projection head with the linear classification head and doing supervised fine-tuning (just on the labeled data) on that. I was saying the results are better when I only do the supervised fine-tuning and do no semi-supervised training, which is weird

MidoAssran commented 3 years ago

hi @brocksar and @jmarrietar, sorry for the delay getting back to you, was on vacation!

But those sound like very interesting use-cases. In terms of hyper-parameters, I think it's totally reasonable to only change the learning rate, so that part is fine.

Could you clarify: are you using your labeled domain specific images in the support set? If so, you would have to implement the proper sampler for it. Happy to take a look at your code!

facebookresearch / suncet

Using pretrained networks and small amounts of labelled data #19