Closed Jarvis73 closed 2 years ago
@Jarvis73 Hi, thanks for your interest!
When I first started working on FSS and reviewing the PFENet code, I had the same questions as you.
As far as I'm concerned, a more prudent and complete approach would be to remove any training images that contain novel regions
. Unfortunately, advanced frameworks do not seem to do so. For example, this paper aims to mine latent classes during training phases, which could be considered novel classes here.
To this end, I added this code on the top of the PFENet repo, but it had little impact on the performance of meta learner (training meta separately, regardless of base). It is unclear to me how utilizing training images with novel regions would affect the results of our paper.
As for the special case you mentioned (all 0's mask), I think it should be discussed at a broader level, i.e., whether any images with novel regions should be included. Perhaps we can understand this in such a way that, when new knowledge is not provided, experts simply treat them as non-foreground regions, or known unknown class (KUC) in open set recognition.
Anyway, the question you raise is very valuable and it also bothers me. Feel free to contact!
Regards, Chunbo
@chunbolang Thanks for your instant reply.
Considering the dataset overlap between splits, I would suggest another dataset FSS1000 which provides 1000 classes with separated train/val/test sets and classes. I think it may be a better benchmark to avoid the novel class effect and thus can highlight the effectiveness of the proposed base learner.
Best, Jianwei
This is a good suggestion and we will consider it in our follow-up work. Training a segmentor capable of recognizing so many categories, and with each image belonging to only one semantic category, seemed both challenging and interesting hhh...
Feel free to contact me for discussion.
Regards, Chunbo
@Jarvis73 Hi, thanks for your interest!
When I first started working on FSS and reviewing the PFENet code, I had the same questions as you.
As far as I'm concerned, a more prudent and complete approach would be to remove
any training images that contain novel regions
. Unfortunately, advanced frameworks do not seem to do so. For example, this paper aims to mine latent classes during training phases, which could be considered novel classes here.To this end, I added this code on the top of the PFENet repo, but it had little impact on the performance of meta learner (training meta separately, regardless of base). It is unclear to me how utilizing training images with novel regions would affect the results of our paper.
As for the special case you mentioned (all 0's mask), I think it should be discussed at a broader level, i.e., whether any images with novel regions should be included. Perhaps we can understand this in such a way that, when new knowledge is not provided, experts simply treat them as non-foreground regions, or known unknown class (KUC) in open set recognition.
Anyway, the question you raise is very valuable and it also bothers me. Feel free to contact!
Regards, Chunbo
I think it is more convincing to train the base learner with the same data as meta-learner in each fold. Can you provide some results (for example val_s0) by training the base learner with only those images in train_s1,s2,s3, and then meta-learning?
Hi, Lang.
It is an interesting work and I have some questions about the dataset of the base learner. In general, an FSS method uses the data split:
When we train on class 6-20 (train_s1+train_s2+train_s3) and test on class 1-5 (val_s0), we don't use those images in train_s0.
Although the images of different splits are partially overlapped, the images only containing class 1-5 in train_s0 should be excluded during meta-training class 6-20.
But it seems that those images solely owned by train_s0 are also used to train the base learner (with a full zero label), which may be a kind of data leakage. Do you have some ablation experiments about this point? For example, using only train_s1+train_s2+train_s3 to train the base learner.
I am curious about this point because too many images (of unseen classes) in training set may help the pre-trained model have some awareness of those unseen classes even if no positive labels are given (for example we can directly use these images to perform self-supervised learning), which excessively voilates the hypothesis of generalizing to unseen classes.