ViTAE-Transformer / SAMRS

The official repo for [NeurIPS'23] "SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model"
286 stars 14 forks source link

About your data for semantic segmenation #5

Open lywang76 opened 1 year ago

lywang76 commented 1 year ago

Hello,

After going through your data, you just labeled objects which has boxes. The background like sky or water are not labeled.

Therefore, I am curious how your data can be used for semantic segmentation as you claimed in your conclusion on page 6. Thanks,

DotWang commented 1 year ago

@lywang76 SAMRS is transformed from remote sensing object detection sets. Sky and water usually don't have suitable prompts such as bounding box, and corresponding category, so we cannot obtain their masks by SAM. Therefore, the SAMRS is more close to the dataset for object segmentation, such as the IsAID, instead of for scene labeling (e.g. ISPRS Potsdam).

DotWang commented 1 year ago

@lywang76 呵呵,我不同意,既然您这态度这么不友好,那我也不用英语了。Isaid除了实例分割标签,也有配套的语义分割标签,并且早就被用上了,给你随便找几篇文章看:

  1. Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery
  2. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
  3. RingMo: A Remote Sensing Foundation Model with Masked Image Modeling

最后生成的SAMRS和IsAID差不多,没有类别的像素就是背景类,请问有问题吗?

lywang76 commented 1 year ago

Hello,

I'm reaching out regarding your samrs paper page 6, where you claim that your data can be used for semantic segmentation. As a reader, we hope to obtain generic semantic segmentation labels for each pixel, rather than a background label for everything.

Your paper 1 mentions that "Geospatial object segmentation is a particular semantic segmentation task." This clarifies that your current data does not support every semantic segmentation task but instead focuses on a specific sub-branch.

Moreover, in your paper 2, it is stated that: "Aerial Image Segmentation is a particular semantic segmentation problem and has several challenging characteristics that general semantic segmentation does not have."

It's essential to note that your conclusion on page 6 does not explicitly mention that the dataset is designed for a particular semantic segmentation task. As a new user of your data, we have the right to question this claim and seek clarification.

I have removed my previous post to avoid any confusion and promote a productive discussion.

Thank you for your attention to this matter.

DotWang commented 1 year ago

@lywang76

Usually, people don’t care if semantic segmentation datasets have background. Since they both can be used to train the model, here I show two examples:

CV field: Dual Attention Network for Scene Segmentation

RS field: RSSFormer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation

In summary, from your sentences I think you may don't know about semantic segmentation. Even if in classical natural image segmentation datasets, such as Pascal VOC 2012, Pascal Context, COCO-Stuff, Cityscapes, and ADE20K, backgrounds are also existed (Pascal Context, Cityscapes and ADE20K can also be considered as the fully annotated dataset because of the background proportion). In RS dataset, the boundary version of ISPRS Potsdam still has undefined pixels. I highly recommend you to checkout the VOC2012.

lywang76 commented 1 year ago

I'm sure you're an expert in this field, and I appreciate your time in answering my questions. Please help me clarify more questions here: 1) In the ADE20K dataset, why do the annotations treat stuff classes like tree, ground, and road with different values? Please see this color map of ADE20k for verification, https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit#gid=0.

2) In your data, everything without a box is marked as background (value 255). How does this support the general semantic segmentation task of identifying stuff classes like sky, ground, and road, which are also essential in aerial image semantic segmentation as people care about?

DotWang commented 1 year ago

@lywang76

  1. ADE20K is artificially annotated by experts. Theoretically, you can manully annotate any classes if you want. SAMRS is automatically generated by transformation. Its categories depend on original RS object dection datasets. I don't understand why tree and water must be involved in an RS segmentation dataset? In other word, If the original RS object dection dataset annotates the tree class, then SAMRS may include it.

  2. Current SAMRS supports the categories that it contains. In fact, in the algorithm community, we don't care about the model accuracy on specific classes, we only compre the overall performance using OA, mIOU, etc.

  3. What is the general semantic segmentation? Do you mean Pascal VOC 2012, COCO-Stuff and IsAID are not the general semantic segmentation dataset? In fact, as I know, nobody would like to distinguish VOC 2012 and ADE20K.

  4. If you insist on identifying trees. I have two ways:

lywang76 commented 1 year ago

Expert, thanks for your time to answer my questions! I have already read those papers after working with your data on semantic segmentation and got poor results. Please help me clarify more. 1) In your words, " I don't understand why tree and water must be involved in an RS segmentation dataset?" Here is LoveDA dataset from Wuhan University, LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

image

Why does a paper from your university and in the same domain care about trees and water here?

DotWang commented 1 year ago

@lywang76 Vegetation, road and water are common RS classes. It is normal for LoveDA to annotate these categories. But this does not mean that trees and water must appear on an RS segmentation dataset.

According your question, why the IsAID that is also from my university does not annotate them?

image image
lywang76 commented 1 year ago

Expert, thanks again for your knowledge and explanation. Does your university's iSAID paper title clearly demonstrate that the iSAID is intended for "instance segmentation" instead of semantic segmentation? That is why IsAID didn't care about staff classes like sky, water, and trees.

I would like to emphasize that my question was specifically about semantic segmentation, which your data claims to support. In light of this, I had expected to see stuff classes for sky, water, and trees included in the dataset. Unfortunately, they are not present, which I have now learned.

DotWang commented 1 year ago

@lywang76 I understand what you mean. As I know, nobody says the semantic segmentation must simultaneously contain stuff and thing, it only involves a pixel-level classification. Instance segmenation will further distinguish different objects even if they are in the same category.

DotWang commented 1 year ago

@lywang76 I have emphasized that:

If you insist on identifying trees.

You can manully transform other object detection datasets that have labeled trees (if you can find it) using our codes (will be released after a few days).

Please refer to other paper that uses GroundingDINO + SAM. For example, (1) Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models. (2) The segment anything model (sam) for remote sensing applications: From zero to one shot.

We don't care specific categories, which are determined by the source detection dataset.

DotWang commented 1 year ago

@lywang76

In addition, initially, this dataset is built for pretraining. At this time, not all pixels need to be annotated. In our experiments, the models pretrained on the SAMRS have achieved better performances on existing semantic segmentation datasets, regardless of whether they contain stuffs.