AhmetSencan / MaskSplit-Self-supervised-Meta-learning-for-Few-shot-Semantic-Segmentation

Code for our method MaskSplit. Paper is available at https://arxiv.org/abs/2110.12207.
MIT License
20 stars 1 forks source link

Data preparation #1

Closed TNA8 closed 2 years ago

TNA8 commented 2 years ago

Thanks for your excellent project.

I have a question about PASCAL dataset.

Thanks in advance.

AhmetSencan commented 2 years ago

Hi @TNA8,

Thank you for your kind words.

I hope this helps.

TNA8 commented 2 years ago

Thanks for your kind reply.

I have a dataset with one class; a pair of input image(contains many objects of one class) and labeled mask.

To apply your approach, I have to create SegmentationClassAug and saliency_unsupervised_model from the original masks. And then to train the model, what specific configurations are needed for this one class problem? Could you help me?

Best regards,

AhmetSencan commented 2 years ago

Firstly, to create SegmentationClassAug, you should put all the labels under a directory with this name. To create saliency_unsupervised_model, you need to first download BAS-NET. Then from the following link you can download the model that we trained with the previously mentioned method: BAS-NET pretrained. As explained in BAS-NET, you should then create saved_models/basnet_bsi/ directory and copy basnet.pth here. After that you can put your images under test_data/test_images/ and run the basnet by the command python basnet_test.py to get the unsupervised saliency estimations for your images under the directory test_data/test_results/. Finally you can put these results under a directory named saliency_unsupervised_model. For masks under both SegmentationClassAug and saliency_unsupervised_model, the names should match the names of your images except for extension(i.e. JPEGImages/2008_000008.jpg", "SegmentationClassAug/2008_000008.png", and "saliency_unsupervised_model/2008_000008.png").

For the configuration, Here is a sample:

DATA:
  train_split: 0
  sup_aug: True
  query_aug: True
  image_size: 400
  use_all_classes: False 
  use_split_coco: False
  train_name: pascal
  test_name: default
  test_split: default
  train_list: lists/pascal/train_masksplit.txt  # change to your own training list
  data_root:   # do not forget to add path to root
  val_list: lists/pascal/val.txt # change to your own validation list
  workers: 4
  vcrop_range: [-40,40]
  vcrop_ignore_support: True
  alternate: True
  vsplit_prob: 1.0
  hsplit_prob: 0.0
  hsplit: False
  vsplit: True
  num_classes_val: 1

TRAIN:
  ckpt_path: checkpoints/
  batch_size: 16
  epochs: 100
  strategy: "unsupervised_fbf"

EVALUATION:
  shot: 1
  visualize: False
  ckpt_used: ""

MODEL:
  arch: resnet
  layers: 101
  pretrained: True  # Means the backbone has been pre-trained
  model_name: Masksplit

DISTRIBUTED:
  gpus: 0

In this configuration, change train_list, val_list and data_root. Moreover, in the src/dataset/dataset.py, replace line 68 with class_list = [1]. Similarly in the same file, replace lines 269-273 with the line class_list = [1].

One final thing is that, I think there has been a recent update to pytorch lightning. So please use the version given in the requirements.

I hope this helps. Edit: num_classes_val should be set to 1 also in config file.

TNA8 commented 2 years ago

Thanks a lot for your detailed explanation. I will try it and will be back with the result. :)

TNA8 commented 2 years ago

One question, this approach is based on self-supervised learning, a.k.a unsupervised learning. To my knowledge, self-supervised learning trains a model on unlabeled data by creating pseudo labels. After than we can use the pretrained model to downstream task using labeled data.

Why do we need labeled data for training?

Thanks.

AhmetSencan commented 2 years ago

The ground truth Masks are only used for validation and testing purposes. copy_paste_loader function provides the dataloader that we use for training and if you check the code, you can see it only provides saliency for the training.

Moreover, our approach does not exactly create a pretrained model that can be used for downstream task. It directly creates a few-shot segmentation model. Given an image, we first divide the saliency estimation in half, using a line with a slope. Then we apply two set of augmentations to obtain a task, on which we can train our model. Basically, the goal here is to given the half of the saliency mask, to predict the other half. Our experiments show that applying different augmentations to support and query, combined with using different halves of the saliency estimation for support and query, enables us to generalize to few-shot task.

I tried to explain our approach as clear as possible. For further details, you can also check our paper, which is available at https://arxiv.org/abs/2110.12207.

TNA8 commented 2 years ago

Thanks. For training, pairs of input image and its saliency image is enough. so we don't need to have mask for all images. Some masks for test and validate are enough. Am I right?

AhmetSencan commented 2 years ago

Exactly. However, while creating the image list files, we added the saliency names after the image and ground truth mask paths, so the parser expects some string. ("JPEGImages/2008_000008.jpg SegmentationClassAug/2008_000008.png saliency_unsupervised_model/2008_000008.png" the one in the middle). Since this is not necessary, the code can be updated so that it only uses a text file for image path and saliency path. Other solution could be to provide a simple string, since it is not used. That is a line in your train_list.txt file could be "JPEGImages/2008_000008.jpg abc saliency_unsupervised_model/2008_000008.png".