ViTAE-Transformer / ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
462 stars 53 forks source link

questions about exp. of semantic seg. #9

Closed Li-Qingyun closed 2 years ago

Li-Qingyun commented 2 years ago

Hi, thanks for your great work and codebase.

The batch size is 8 in the paper, and 4 in the config of Swin-T-IMP+UperNet. And I did not find any description of num_gpu for the semantic seg. subsection. In the README.md of semantic seg., the command:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

which seems to set num_gpus_per_node as 1? or your command is for 2 single GPU nodes and batch_size 4 for each (2x4)?

DotWang commented 2 years ago

batchsize=samples_per_gpu * gpu_number

samples_per_gpu is set in configs/_base_/datasets/xxxx.py (xxxx is the dataset you used)

gpu_number is controled by CUDA_VISIBLE_DEVICES

while --nproc_per_node = gpu_number

For example, the samples_per_gpu in potsdam.py has been set to 8

so if you want to set batchsize=16 and operate on GPU 0,1, you can use

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

In configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py

We reset samples_per_gpu=4

Thus, if you need batchsize=8, please use

CUDA_VISIBLE_DEVICES=x,y python -m torch.distributed.launch --nproc_per_node=2 --master_port=xxxxx tools/train.py \
    configs/upernet/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py \
    --launcher 'pytorch'
Li-Qingyun commented 2 years ago

@DotWang Thanks for your reply~

I have trained with configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py, both 1x8 and 2x4 strategies were tested, which only achieved the following eval results: 图片

The eval results of each eval_interval are as follows: iter aAcc mFscore mIoU
8000 80.81 / 59.73
16000 82.03 / 61.41
24000 82.52 / 61.96
32000 82.72 / 62.41
40000 83.23 / 62.97
48000 83.0 75.03 62.63
56000 82.69 74.63 62.27
64000 82.88 75.19 62.7
72000 83.35 75.57 63.29
80000 83.3 75.58 63.23

Hence I open this issue to ask. I'll appreciate your assistance!

DotWang commented 2 years ago

@Li-Qingyun How did you prepare the potsdam dataset?

This dataset contains two versions including RGB and IR-R-G

and the label also has two versions: with or without boundary

In our implementation, we use '3_Ortho_IRRG.zip' and '5_Labels_all.zip'.

In addition, you can check the label

In our experiment, the label in '5_Labels_all.zip' range from 0-5, so we directly ignore the class 5

Another kind of label extra includes an undefined category.

Note we don't use the transformation function provided by mmsegmentation.

If you use it, you may need to adjust corresponding settings

such as

whether to reduce_zero_label in configs/_base_/datasets/potsdam.py;

settings of num_classes and ignore_index in configs/swin/your config file;

and the dataset file in mmseg/datasets/your dataset file

We have not described these in the readme.md since they are highly customized.

Li-Qingyun commented 2 years ago

@DotWang Thanks for your support! I followed the official guides for preparing datasets of mmseg, in which the '2_Ortho_RGB.zip' and '5_Labels_all_noBoundary.zip' are required. Such a huge difference between segmentation results of RGB w/o Boundary and IRRG w/ Boundary.

Li-Qingyun commented 2 years ago

Oh, the actually used zip is '4_Ortho_RGBIR.zip' and '5_Labels_for_participants_no_Boundary.zip'.

Li-Qingyun commented 2 years ago

@DotWang Why each image in RGBIR contains 3 channels?

DotWang commented 2 years ago

@Li-Qingyun RGBIR image contains 4 channels: R, G, B, NIR

you can use skimage to read it

But since we use ordinary deep models for processing 3-channel images, the RGBIR is usually not used.

Li-Qingyun commented 2 years ago

@DotWang Thanks, I used the cv2.imread, which read imgs in 3-channel mode. I know too little about this dataset, thank you for your support. I will adjust the data and rerun the experiment.

Li-Qingyun commented 2 years ago

@DotWang Hi, for the metric provided by mmseg.

Is the 'mFscore' correspond to mf1? and the 'aAcc' correspond to OA?

DotWang commented 2 years ago

@Li-Qingyun yes

Li-Qingyun commented 2 years ago

Another kind of label extra includes an undefined category. Note we don't use the transformation function provided by mmsegmentation. If you use it, you may need to adjust corresponding settings such as whether to reduce_zero_label in configs/base/datasets/potsdam.py; settings of num_classes and ignore_index in configs/swin/your config file; and the dataset file in mmseg/datasets/your dataset file We have not described these in the readme.md since they are highly customized.

I followed the custom potsdam.py in the repo, setting reduce_zero_label=False, ignore_index=5, the dataset was prepared with Semantic Segmentation/tools/convert_datasets/potsdam.py, '3_Ortho_IRRG.zip' and '5_Labels_all.zip' were adoped.

My training achieved OA (91.22) of upernet+swin-T-IMP, however about 88.69 mFscore only.

图片

I think my setting of reduce_zero_label and ignore_index might be wrong.

I wrote a script to read the annotation (prepared by tools/convert_datasets/potsdam.py). and found that, for the '5_Labels_all.zip', the script turns the palette to 1~5.

multiclass_mask_all bin_mask all

for the '5_Labels_noBoundary.zip', the script turns the palette to 0~5, in which the 0 seems to be boundary.

multiclass_mask_noboundary bin_mask noboundary

the IRRG image is: 2_10_0_0_512_512_IRRG

And the CLASSES in both potsdam.py and potsdam_ori.py are:

CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
           'car', 'clutter')

PALETTE = [[255, 255, 255], [0, 0, 255], [0, 255, 255], [0, 255, 0],
           [255, 255, 0], [255, 0, 0]]

I think the class which should be ignored is 'clutter', isn't it?

And the comment of the PotsdamDataset:

@DATASETS.register_module()
class PotsdamDataset(CustomDataset):
    """ISPRS Potsdam dataset.

    In segmentation map annotation for Potsdam dataset, 0 is the ignore index.
    ``reduce_zero_label`` should be set to True. The ``img_suffix`` and
    ``seg_map_suffix`` are both fixed to '.png'.
    """

said 0 is the ignore index, and `reduce_zero_label`` should be set to True

I'm still confused about the dataset preparing and the true usage to achieve the reported results as a baseline of my work. I'll appreciate your help and be willing to pull a request of the dataset preparing.

Thanks for your quick replies.

The script is as follow:

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

from mmsegmentation.configs_rs._base_.potsdam import data as RGB_data
from mmsegmentation.configs_rs._base_.potsdam_IRRG import data as IRRG_data

from mmseg.datasets import build_dataset
RGB_trainset = build_dataset(RGB_data['train'])
IRRG_trainset = build_dataset(IRRG_data['train'])

palette_map = {
    '[255, 255, 255]': 'black',
    '[0, 0, 255]': 'blue',
    '[0, 255, 0]': 'green',
    '[255, 0, 0]': 'red',
    '[255, 255, 0]': 'yellow',
    '[255, 0, 255]': '',
    '[0, 255, 255]': 'cyan',
}

# CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
#            'car', 'clutter')
CLASSES = ('clutter', 'impervious_surface', 'building', 'low_vegetation',
           'tree', 'car')

palette = RGB_trainset.PALETTE
print({k: palette_map[str(v)] for k, v in enumerate(palette)})

def save_ann_with_custom_palette(ann_path, output_path, ann_name):
    ann = Image.open(ann_path)
    ann_array = np.array(ann)
    print(f'{ann_name}: {np.unique(ann_array)}')
    save_bin_mask(ann_array, ann_name, output_path)
    h, w = ann_array.shape
    classes = np.unique(ann_array)
    out_ann = np.zeros((h, w, 3))
    for cls in classes:
        indices = np.nonzero(ann_array == cls)
        out_ann[indices] = palette[cls]
    plt.figure()
    plt.title(ann_name)
    plt.imshow(out_ann)
    plt.savefig(output_path + f'{ann_name}.png')

def save_bin_mask(ann_array: np.ndarray, remark: str, output_path):
    plt.figure()
    plt.suptitle(f'label {remark}')
    classes = np.unique(ann_array)
    _len = len(classes)
    subplot_w = int(np.ceil(np.sqrt(_len)))
    subplot_h = int(np.ceil(_len / subplot_w))
    gs = gridspec.GridSpec(subplot_h, subplot_w * 2)
    gs.update(wspace=0.8)
    for i, cls in enumerate(np.unique(ann_array)):
        bin_mask = (ann_array == cls).astype(np.float32)
        if _len - i >= subplot_w or _len % 2 == 0:
            plt.subplot(
                gs[i // subplot_w, i % subplot_w * 2: i % subplot_w * 2 + 2])
        else:
            plt.subplot(
                gs[i // subplot_w, i % subplot_w * 2 + 1: i % subplot_w * 2 + 3])
        plt.title(f'{cls}-{CLASSES[cls]}')
        # plt.title(f'class {cls} ({remark})')
        plt.imshow(bin_mask)
    plt.savefig(output_path + f'bin_mask {remark}')

ann0_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
            f'/potsdam/ann_noboundary/train/2_10_0_0_512_512.png'
ann1_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
            f'/potsdam/ann_all/train/2_10_0_0_512_512.png'
output_path = '/home/lqy/Desktop/DINO_semantic_seg/develop/dataset/'
save_ann_with_custom_palette(ann0_path, output_path, 'noboundary')
save_ann_with_custom_palette(ann1_path, output_path, 'all')
DotWang commented 2 years ago

@Li-Qingyun

The accuracies of the "impervious_surface" in your table are None, it is obviously wrong.

In addition, the category and corresponding color are officially defined, and I suggest that you do not change it.

We transform the label of '5_Labels_all.zip' by directly mapping since the "Undefined" category does not exist, here are our codes, note we use the skimage.io to load image

palette = {0 : (255, 255, 255),  # Impervious surfaces (white)
           1 : (0, 0, 255),     # Buildings (blue)
           2 : (0, 255, 255),   # Low vegetation (cyan)
           3 : (0, 255, 0),     # Trees (green)
           4 : (255, 255, 0),   # Cars (yellow)
           5 : (255, 0, 0),     # Clutter (red)
           6 : (0, 0, 0)}       # Undefined (black)

invert_palette = {v: k for k, v in palette.items()}

def convert_from_color(arr_3d, palette=invert_palette):
    """ RGB-color encoding to grayscale labels """
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)

    for c, i in palette.items():
        m = np.all(arr_3d == np.array(c).reshape(1, 1, 3), axis=2)
        arr_2d[m] = i

    return arr_2d

def load_img(imgPath):
    """
    Load image
    :param imgPath: path of the image to load
    :return: numpy array of the image
    """
    if imgPath.endswith('.tif'):
        img = io.imread(imgPath)
        #img = tif.read_image()
        # img = tifffile.imread(imgPath)
    else:
        raise ValueError('Install pillow and uncomment line in load_img')
    return img

Thus we set reduce_zero_label=False, num_classes=5 and ignore_index=5 to ignore the "Clutter" category

The corresponding transformation in mmseg is

    if to_label:
        color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
                              [255, 255, 0], [0, 255, 0], [0, 255, 255],
                              [0, 0, 255]])

Note: the RGBs are inversed since mmcv use the opencv to read image

If you use this function, since '5_Labels_all.zip' doesn't have the "black boundary", the label will be transformed to 1-6 (here, cluster=6)

(Correspondingly, '5_Labels_noBoundary.zip' will be transformed to 0-6.)

At this time, the reduce_zero_label should be in True (1-6 -> 0-5), then set num_classes=5 and ignore_index=5.

Li-Qingyun commented 2 years ago

@DotWang Thanks for your replies.

I did not find your script of preparing dataset in the repo at first, hence, I followed the official instruments of mmseg, which seems mismatched with the PotsdamDataset classes in the potsdam.py this repo. The potsdam_ori.py is the one should be used.

I searched the reduce_zero_label parameter globally and tried to understand how it make effects. The core logic is as follows:

if self.reduce_zero_label:
    # avoid using underflow conversion
    gt_semantic_seg[gt_semantic_seg == 0] = 255
    gt_semantic_seg = gt_semantic_seg - 1
    gt_semantic_seg[gt_semantic_seg == 254] = 255

which makes the background class to be labeled 255.

And there seems to be two place of reduce_zero_label working:

  1. LoadAnnotation of the pipline
  2. CustomDataset for eval metric calculating

And whey all do the same thing, which is easy to wonder if the action will be repeated. I thought the train annotations is convert by the one in LoadAnnotation and the val annotations is convert by the one of CustomDataset. It seems that we hardly ever call the eval function to verify the segmentation performance of the model on the training set, otherwise, the reduce operation is likely to be performed twice.

Closer to home, if the user follow mmseg's official dataset preparation, the label seems to have gone through the following mapping process (the color format is RGB):

{0 : (255, 255, 255),  # Impervious surfaces (white)
 1 : (0, 0, 255),     # Buildings (blue)
 2 : (0, 255, 255),   # Low vegetation (cyan)
 3 : (0, 255, 0),     # Trees (green)
 4 : (255, 255, 0),   # Cars (yellow)
 5 : (255, 0, 0),     # Clutter (red)
 6 : (0, 0, 0)}       # Undefined (black)
               | | | |
               | | | |  
             \ | | | | /     transformation in `convert_datasets/potsdam.py`
               \ | | /
                 \ /
{0 : (0, 0, 0), # Undefined (black)
 1 : (255, 255, 255),  # Impervious surfaces (white)
 2 : (0, 0, 255),     # Buildings (blue)
 3 : (0, 255, 255),   # Low vegetation (cyan)
 4 : (0, 255, 0),     # Trees (green)
 5 : (255, 255, 0),   # Cars (yellow)
 6 : (255, 0, 0)}     # Clutter (red)
               | | | |
               | | | |  
             \ | | | | /     reduce_zero_label in `LoadAnnotation` 
               \ | | /
                 \ /
{0 : (255, 255, 255),  # Impervious surfaces (white)
 1 : (0, 0, 255),     # Buildings (blue)
 2 : (0, 255, 255),   # Low vegetation (cyan)
 3 : (0, 255, 0),     # Trees (green)
 4 : (255, 255, 0),   # Cars (yellow)
 5 : (255, 0, 0),     # Clutter (red)
 255 : (0, 0, 0)}       # Undefined (black)

Hence, in the official potsdam_ori.py, it is ignore_index=255. They use 'label_noBoundary.zip', whose converted labels (The second one) are [0 1 2 3 4 5 6]. When setting reduce_zero_label=True, the labels are [255 0 1 2 3 4 5], hence the ignore_index was set 255, which set the Undefined background as the ignored category. However, actually both of 5 and 255 should be ignored, isn't it?

In the ViTAE-RS, for 'label_all.zip', whose converted labels are [1 2 3 4 5 6]. When setting 'reduce_zero_label=True', the labels are [0 1 2 3 4 5], hence the ignore_index was set 5. If 'reduce_zero_label=False', the labels are [1 2 3 4 5 6], the ignored index is 6, however, a 0 is an extra label. Hence, the transformation should be deleted in this repo, to keep the origin [0 1 2 3 4 5], and 5 is the ignored Clutter class and the Undefined background class 6 is not annotated, so ignore_index=5.

DotWang commented 2 years ago

@Li-Qingyun Haha, the script of preparing potsdam dataset is used in our previous projects, so we adopt it instead of the mmseg transformation in this work. We do not upload the script since we think the mmseg is highly customized for users.

Most of your understanding is right. The transformation convert_datasets/potsdam.py exists in the original mmseg, we didn't even use this folder and directly upload them.

The mIOUs that are shown in the mmseg site include all categories except the "Undefined". However, in RS literatures, the "Cluster" is also considered as background and does not take part in the metric calculation. In fact, whether or not to mask this category in training is both OK. For convenience, we also ignore it when training models.

Li-Qingyun commented 2 years ago

@DotWang Thank you very much for your help and detailed and patient explanation, I finally achieved the results in the paper and can focus on doing my own research. Wish you all the best with your research. Thank your !