ViTAE-Transformer / ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
438 stars 53 forks source link

About the IMP weights on change detection #5

Closed lauraset closed 1 year ago

lauraset commented 2 years ago

Hello, @DotWang. Your work is great. The results in your paper show that the bit with IMP-ViTAEv2-S weights performs best. So I wonder whether the pretrained weights from IMP on change detection will be released. Thank you very much.

DotWang commented 2 years ago

The IMP weights have been released in the ViTAE-Transformer repo, please move to https://github.com/ViTAE-Transformer/ViTAE-Transformer/tree/main/Image-Classification

lauraset commented 2 years ago

Hi, @DotWang. Thank you very much. I got it. I still have a question about the lower performance of RSP in semantic segmentation and change detection, compared to IMP. You mentioned two reasons, i.e., the dataset volume and the task granularity. However, intuitively, the data distribution of MillionAID is closer to these datasets (i.e., Postdam) used in remote senisng. Is the lower performance of RSP related to the heavy reliance of transformer on large amounts of samples? On the other hand, the task granularity exists in both RSP and IMP. So I feel that this reason may not explain the lower performance of RSP. However, it proved that RSP may be only effective in the classification task and can not generalize well to the segmentation task. Maybe there are some errors in my statement. So I want to know your opinion. Thank you very much.

DotWang commented 2 years ago

I first explain the task granularity. We conduct the IMP and RSP on four tasks: classification, detection, segmentation, and change detection (CD). The experiment results show that RSP performs better than IMP on the first two tasks, not only on the classification task. Intuitively, the granularity of classification and detection are separately in the scene and object-level, meaning the features that they require are close, which is convenient for the transferring of RSP weights. Segmentation is operated at pixel-level, compared with detection, it requires more detailed semantic information. From the task definition, CD may locate between detection and segmentation.

For the data volume, here, the volume does not only mean the image number, it also means the category number. Our pretraining dataset --- MillionAID only has 51 classes, far less than the Imagenet-IK. Limited categories decrease the dataset complexity, restricting the model performance. As you can see, our pretraining accuracies can reach 98%, that is almost impossible on the ImageNet-1k training. At this time, the model may not learn universal and detailed representations as the IMP. In my own opinion, the RSP may perform better than IMP when the pertaining dataset becomes more challenging.

I also notice that you mention the Potsdam dataset. For this dataset, the spectral differences may also affect the performance. Since we use IR-R-G channels, the gaps between evaluation and pretraining are larger than other RS dataset.

In summary:

lauraset commented 1 year ago

Hello, @DotWang. Thank you for your detailed explaination. I got it.