MIC-DKFZ / nnUNet

Apache License 2.0
5.57k stars 1.7k forks source link

Is it possible to use unlabelled data for pretraining? #2338

Open menna1012 opened 2 months ago

menna1012 commented 2 months ago

Hi,

I have about 500 labelled images and 1500 unlabelled images. I wonder if it is possible to pretrain a model using those unlabelled images and then use the pretrained weights to initialize the model to train on the labelled dataset.

I followed those steps: https://github.com/Kent0n-Li/nnSAM/blob/main/documentation/pretraining_and_finetuning.md but stuck in the steps (nnUNetv2_extract_fingerprint and nnUNetv2_move_plans_between_datasets) because labels are required ("labels" field in dataset.json file and labelled images in the labelsTr folder.

Is there a way to utilize those unlabelled dataset to improve overall training results on the labelled one? or nnUNet must work only on labelled dataset?

Thanks in advance for your response.

mrokuss commented 1 month ago

Hey @menna1012

Interesting and very valid question. Generally this is not possible out of the box and you would need to manually alter the code for self-supervised pretraining. What you could do however to leverage the unlabelled images is the following:

  1. Train a model on the labelled images (Dataset001)
  2. Use this model to predict segmentations for the unlabelled images
  3. Train another model on the unlabelled images with the predicted segmentations as target(Dataset002)
  4. Use this checkpoint as pertained weights for another training on Dataset001. Use a reduces learning rate here (fine-tuning)

Hope this helps!