chunbolang / BAM

Official PyTorch Implementation of Learning What Not to Segment: A New Perspective on Few-Shot Segmentation (CVPR'22 Oral & TPAMI'23).
MIT License
243 stars 42 forks source link

Training dataset of the base learner #7

Closed Jarvis73 closed 2 years ago

Jarvis73 commented 2 years ago

Hi, Lang.

It is an interesting work and I have some questions about the dataset of the base learner. In general, an FSS method uses the data split:

split0 split1 split2 split3
class1-5 class6-10 class11-15 class16-20
train_s0 train_s1 train_s2 train_s3
val_s0 val_s1 val_s2 val_s3

When we train on class 6-20 (train_s1+train_s2+train_s3) and test on class 1-5 (val_s0), we don't use those images in train_s0.

Although the images of different splits are partially overlapped, the images only containing class 1-5 in train_s0 should be excluded during meta-training class 6-20.

But it seems that those images solely owned by train_s0 are also used to train the base learner (with a full zero label), which may be a kind of data leakage. Do you have some ablation experiments about this point? For example, using only train_s1+train_s2+train_s3 to train the base learner.

I am curious about this point because too many images (of unseen classes) in training set may help the pre-trained model have some awareness of those unseen classes even if no positive labels are given (for example we can directly use these images to perform self-supervised learning), which excessively voilates the hypothesis of generalizing to unseen classes.

chunbolang commented 2 years ago

@Jarvis73 Hi, thanks for your interest!

When I first started working on FSS and reviewing the PFENet code, I had the same questions as you.

As far as I'm concerned, a more prudent and complete approach would be to remove any training images that contain novel regions. Unfortunately, advanced frameworks do not seem to do so. For example, this paper aims to mine latent classes during training phases, which could be considered novel classes here.

To this end, I added this code on the top of the PFENet repo, but it had little impact on the performance of meta learner (training meta separately, regardless of base). It is unclear to me how utilizing training images with novel regions would affect the results of our paper.

As for the special case you mentioned (all 0's mask), I think it should be discussed at a broader level, i.e., whether any images with novel regions should be included. Perhaps we can understand this in such a way that, when new knowledge is not provided, experts simply treat them as non-foreground regions, or known unknown class (KUC) in open set recognition.

Anyway, the question you raise is very valuable and it also bothers me. Feel free to contact!

Regards, Chunbo

Jarvis73 commented 2 years ago

@chunbolang Thanks for your instant reply.

Considering the dataset overlap between splits, I would suggest another dataset FSS1000 which provides 1000 classes with separated train/val/test sets and classes. I think it may be a better benchmark to avoid the novel class effect and thus can highlight the effectiveness of the proposed base learner.

Best, Jianwei

chunbolang commented 2 years ago

This is a good suggestion and we will consider it in our follow-up work. Training a segmentor capable of recognizing so many categories, and with each image belonging to only one semantic category, seemed both challenging and interesting hhh...

Feel free to contact me for discussion.

Regards, Chunbo

Jarvis73 commented 2 years ago

@Jarvis73 Hi, thanks for your interest!

When I first started working on FSS and reviewing the PFENet code, I had the same questions as you.

As far as I'm concerned, a more prudent and complete approach would be to remove any training images that contain novel regions. Unfortunately, advanced frameworks do not seem to do so. For example, this paper aims to mine latent classes during training phases, which could be considered novel classes here.

To this end, I added this code on the top of the PFENet repo, but it had little impact on the performance of meta learner (training meta separately, regardless of base). It is unclear to me how utilizing training images with novel regions would affect the results of our paper.

As for the special case you mentioned (all 0's mask), I think it should be discussed at a broader level, i.e., whether any images with novel regions should be included. Perhaps we can understand this in such a way that, when new knowledge is not provided, experts simply treat them as non-foreground regions, or known unknown class (KUC) in open set recognition.

Anyway, the question you raise is very valuable and it also bothers me. Feel free to contact!

Regards, Chunbo

I think it is more convincing to train the base learner with the same data as meta-learner in each fold. Can you provide some results (for example val_s0) by training the base learner with only those images in train_s1,s2,s3, and then meta-learning?