Closed Li-Qingyun closed 2 years ago
batchsize
=samples_per_gpu
* gpu_number
samples_per_gpu
is set in configs/_base_/datasets/xxxx.py (xxxx is the dataset you used)
gpu_number
is controled by CUDA_VISIBLE_DEVICES
while --nproc_per_node
= gpu_number
For example, the samples_per_gpu
in potsdam.py has been set to 8
so if you want to set batchsize=16
and operate on GPU 0,1, you can use
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=40001 tools/train.py \
configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
--launcher 'pytorch'
In configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py
We reset samples_per_gpu=4
Thus, if you need batchsize=8
, please use
CUDA_VISIBLE_DEVICES=x,y python -m torch.distributed.launch --nproc_per_node=2 --master_port=xxxxx tools/train.py \
configs/upernet/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py \
--launcher 'pytorch'
@DotWang Thanks for your reply~
I have trained with configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py
, both 1x8 and 2x4 strategies were tested, which only achieved the following eval results:
The eval results of each eval_interval are as follows: | iter | aAcc | mFscore | mIoU |
---|---|---|---|---|
8000 | 80.81 | / | 59.73 | |
16000 | 82.03 | / | 61.41 | |
24000 | 82.52 | / | 61.96 | |
32000 | 82.72 | / | 62.41 | |
40000 | 83.23 | / | 62.97 | |
48000 | 83.0 | 75.03 | 62.63 | |
56000 | 82.69 | 74.63 | 62.27 | |
64000 | 82.88 | 75.19 | 62.7 | |
72000 | 83.35 | 75.57 | 63.29 | |
80000 | 83.3 | 75.58 | 63.23 |
Hence I open this issue to ask. I'll appreciate your assistance!
@Li-Qingyun How did you prepare the potsdam dataset?
This dataset contains two versions including RGB and IR-R-G
and the label also has two versions: with or without boundary
In our implementation, we use '3_Ortho_IRRG.zip' and '5_Labels_all.zip'.
In addition, you can check the label
In our experiment, the label in '5_Labels_all.zip' range from 0-5, so we directly ignore the class 5
Another kind of label extra includes an undefined category.
Note we don't use the transformation function provided by mmsegmentation.
If you use it, you may need to adjust corresponding settings
such as
whether to reduce_zero_label
in configs/_base_/datasets/potsdam.py
;
settings of num_classes
and ignore_index
in configs/swin/your config file
;
and the dataset file in mmseg/datasets/your dataset file
We have not described these in the readme.md since they are highly customized.
@DotWang Thanks for your support! I followed the official guides for preparing datasets of mmseg, in which the '2_Ortho_RGB.zip' and '5_Labels_all_noBoundary.zip' are required. Such a huge difference between segmentation results of RGB w/o Boundary and IRRG w/ Boundary.
Oh, the actually used zip is '4_Ortho_RGBIR.zip' and '5_Labels_for_participants_no_Boundary.zip'.
@DotWang Why each image in RGBIR contains 3 channels?
@Li-Qingyun RGBIR image contains 4 channels: R, G, B, NIR
you can use skimage to read it
But since we use ordinary deep models for processing 3-channel images, the RGBIR is usually not used.
@DotWang Thanks, I used the cv2.imread, which read imgs in 3-channel mode. I know too little about this dataset, thank you for your support. I will adjust the data and rerun the experiment.
@DotWang Hi, for the metric provided by mmseg.
Is the 'mFscore' correspond to mf1? and the 'aAcc' correspond to OA?
@Li-Qingyun yes
Another kind of label extra includes an undefined category. Note we don't use the transformation function provided by mmsegmentation. If you use it, you may need to adjust corresponding settings such as whether to reduce_zero_label in configs/base/datasets/potsdam.py; settings of num_classes and ignore_index in configs/swin/your config file; and the dataset file in mmseg/datasets/your dataset file We have not described these in the readme.md since they are highly customized.
I followed the custom potsdam.py in the repo, setting reduce_zero_label=False
, ignore_index=5
, the dataset was prepared with Semantic Segmentation/tools/convert_datasets/potsdam.py, '3_Ortho_IRRG.zip' and '5_Labels_all.zip' were adoped.
My training achieved OA (91.22) of upernet+swin-T-IMP, however about 88.69 mFscore only.
I think my setting of reduce_zero_label and ignore_index might be wrong.
I wrote a script to read the annotation (prepared by tools/convert_datasets/potsdam.py
).
and found that, for the '5_Labels_all.zip', the script turns the palette to 1~5.
for the '5_Labels_noBoundary.zip', the script turns the palette to 0~5, in which the 0 seems to be boundary.
the IRRG image is:
And the CLASSES in both potsdam.py and potsdam_ori.py are:
CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
'car', 'clutter')
PALETTE = [[255, 255, 255], [0, 0, 255], [0, 255, 255], [0, 255, 0],
[255, 255, 0], [255, 0, 0]]
I think the class which should be ignored is 'clutter', isn't it?
And the comment of the PotsdamDataset:
@DATASETS.register_module()
class PotsdamDataset(CustomDataset):
"""ISPRS Potsdam dataset.
In segmentation map annotation for Potsdam dataset, 0 is the ignore index.
``reduce_zero_label`` should be set to True. The ``img_suffix`` and
``seg_map_suffix`` are both fixed to '.png'.
"""
said 0 is the ignore index, and `reduce_zero_label`` should be set to True
I'm still confused about the dataset preparing and the true usage to achieve the reported results as a baseline of my work. I'll appreciate your help and be willing to pull a request of the dataset preparing.
Thanks for your quick replies.
The script is as follow:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from mmsegmentation.configs_rs._base_.potsdam import data as RGB_data
from mmsegmentation.configs_rs._base_.potsdam_IRRG import data as IRRG_data
from mmseg.datasets import build_dataset
RGB_trainset = build_dataset(RGB_data['train'])
IRRG_trainset = build_dataset(IRRG_data['train'])
palette_map = {
'[255, 255, 255]': 'black',
'[0, 0, 255]': 'blue',
'[0, 255, 0]': 'green',
'[255, 0, 0]': 'red',
'[255, 255, 0]': 'yellow',
'[255, 0, 255]': '',
'[0, 255, 255]': 'cyan',
}
# CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
# 'car', 'clutter')
CLASSES = ('clutter', 'impervious_surface', 'building', 'low_vegetation',
'tree', 'car')
palette = RGB_trainset.PALETTE
print({k: palette_map[str(v)] for k, v in enumerate(palette)})
def save_ann_with_custom_palette(ann_path, output_path, ann_name):
ann = Image.open(ann_path)
ann_array = np.array(ann)
print(f'{ann_name}: {np.unique(ann_array)}')
save_bin_mask(ann_array, ann_name, output_path)
h, w = ann_array.shape
classes = np.unique(ann_array)
out_ann = np.zeros((h, w, 3))
for cls in classes:
indices = np.nonzero(ann_array == cls)
out_ann[indices] = palette[cls]
plt.figure()
plt.title(ann_name)
plt.imshow(out_ann)
plt.savefig(output_path + f'{ann_name}.png')
def save_bin_mask(ann_array: np.ndarray, remark: str, output_path):
plt.figure()
plt.suptitle(f'label {remark}')
classes = np.unique(ann_array)
_len = len(classes)
subplot_w = int(np.ceil(np.sqrt(_len)))
subplot_h = int(np.ceil(_len / subplot_w))
gs = gridspec.GridSpec(subplot_h, subplot_w * 2)
gs.update(wspace=0.8)
for i, cls in enumerate(np.unique(ann_array)):
bin_mask = (ann_array == cls).astype(np.float32)
if _len - i >= subplot_w or _len % 2 == 0:
plt.subplot(
gs[i // subplot_w, i % subplot_w * 2: i % subplot_w * 2 + 2])
else:
plt.subplot(
gs[i // subplot_w, i % subplot_w * 2 + 1: i % subplot_w * 2 + 3])
plt.title(f'{cls}-{CLASSES[cls]}')
# plt.title(f'class {cls} ({remark})')
plt.imshow(bin_mask)
plt.savefig(output_path + f'bin_mask {remark}')
ann0_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
f'/potsdam/ann_noboundary/train/2_10_0_0_512_512.png'
ann1_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
f'/potsdam/ann_all/train/2_10_0_0_512_512.png'
output_path = '/home/lqy/Desktop/DINO_semantic_seg/develop/dataset/'
save_ann_with_custom_palette(ann0_path, output_path, 'noboundary')
save_ann_with_custom_palette(ann1_path, output_path, 'all')
@Li-Qingyun
The accuracies of the "impervious_surface" in your table are None, it is obviously wrong.
In addition, the category and corresponding color are officially defined, and I suggest that you do not change it.
We transform the label of '5_Labels_all.zip' by directly mapping since the "Undefined" category does not exist, here are our codes, note we use the skimage.io to load image
palette = {0 : (255, 255, 255), # Impervious surfaces (white)
1 : (0, 0, 255), # Buildings (blue)
2 : (0, 255, 255), # Low vegetation (cyan)
3 : (0, 255, 0), # Trees (green)
4 : (255, 255, 0), # Cars (yellow)
5 : (255, 0, 0), # Clutter (red)
6 : (0, 0, 0)} # Undefined (black)
invert_palette = {v: k for k, v in palette.items()}
def convert_from_color(arr_3d, palette=invert_palette):
""" RGB-color encoding to grayscale labels """
arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
for c, i in palette.items():
m = np.all(arr_3d == np.array(c).reshape(1, 1, 3), axis=2)
arr_2d[m] = i
return arr_2d
def load_img(imgPath):
"""
Load image
:param imgPath: path of the image to load
:return: numpy array of the image
"""
if imgPath.endswith('.tif'):
img = io.imread(imgPath)
#img = tif.read_image()
# img = tifffile.imread(imgPath)
else:
raise ValueError('Install pillow and uncomment line in load_img')
return img
Thus we set reduce_zero_label=False
, num_classes=5
and ignore_index=5
to ignore the "Clutter" category
The corresponding transformation in mmseg is
if to_label:
color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
[255, 255, 0], [0, 255, 0], [0, 255, 255],
[0, 0, 255]])
Note: the RGBs are inversed since mmcv use the opencv to read image
If you use this function, since '5_Labels_all.zip' doesn't have the "black boundary", the label will be transformed to 1-6 (here, cluster=6)
(Correspondingly, '5_Labels_noBoundary.zip' will be transformed to 0-6.)
At this time, the reduce_zero_label
should be in True (1-6 -> 0-5), then set num_classes=5
and ignore_index=5
.
@DotWang Thanks for your replies.
I did not find your script of preparing dataset in the repo at first, hence, I followed the official instruments of mmseg, which seems mismatched with the PotsdamDataset classes in the potsdam.py this repo. The potsdam_ori.py is the one should be used.
I searched the reduce_zero_label parameter globally and tried to understand how it make effects. The core logic is as follows:
if self.reduce_zero_label:
# avoid using underflow conversion
gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255
which makes the background class to be labeled 255.
And there seems to be two place of reduce_zero_label working:
And whey all do the same thing, which is easy to wonder if the action will be repeated. I thought the train annotations is convert by the one in LoadAnnotation and the val annotations is convert by the one of CustomDataset. It seems that we hardly ever call the eval function to verify the segmentation performance of the model on the training set, otherwise, the reduce operation is likely to be performed twice.
Closer to home, if the user follow mmseg's official dataset preparation, the label seems to have gone through the following mapping process (the color format is RGB):
{0 : (255, 255, 255), # Impervious surfaces (white)
1 : (0, 0, 255), # Buildings (blue)
2 : (0, 255, 255), # Low vegetation (cyan)
3 : (0, 255, 0), # Trees (green)
4 : (255, 255, 0), # Cars (yellow)
5 : (255, 0, 0), # Clutter (red)
6 : (0, 0, 0)} # Undefined (black)
| | | |
| | | |
\ | | | | / transformation in `convert_datasets/potsdam.py`
\ | | /
\ /
{0 : (0, 0, 0), # Undefined (black)
1 : (255, 255, 255), # Impervious surfaces (white)
2 : (0, 0, 255), # Buildings (blue)
3 : (0, 255, 255), # Low vegetation (cyan)
4 : (0, 255, 0), # Trees (green)
5 : (255, 255, 0), # Cars (yellow)
6 : (255, 0, 0)} # Clutter (red)
| | | |
| | | |
\ | | | | / reduce_zero_label in `LoadAnnotation`
\ | | /
\ /
{0 : (255, 255, 255), # Impervious surfaces (white)
1 : (0, 0, 255), # Buildings (blue)
2 : (0, 255, 255), # Low vegetation (cyan)
3 : (0, 255, 0), # Trees (green)
4 : (255, 255, 0), # Cars (yellow)
5 : (255, 0, 0), # Clutter (red)
255 : (0, 0, 0)} # Undefined (black)
Hence, in the official potsdam_ori.py
, it is ignore_index=255
.
They use 'label_noBoundary.zip', whose converted labels (The second one) are [0 1 2 3 4 5 6].
When setting reduce_zero_label=True
, the labels are [255 0 1 2 3 4 5], hence the ignore_index was set 255, which set the Undefined background as the ignored category. However, actually both of 5 and 255 should be ignored, isn't it?
In the ViTAE-RS, for 'label_all.zip', whose converted labels are [1 2 3 4 5 6].
When setting 'reduce_zero_label=True', the labels are [0 1 2 3 4 5], hence the ignore_index was set 5.
If 'reduce_zero_label=False', the labels are [1 2 3 4 5 6], the ignored index is 6, however, a 0 is an extra label. Hence, the transformation should be deleted in this repo, to keep the origin [0 1 2 3 4 5], and 5 is the ignored Clutter class and the Undefined background class 6 is not annotated, so ignore_index=5
.
@Li-Qingyun Haha, the script of preparing potsdam dataset is used in our previous projects, so we adopt it instead of the mmseg transformation in this work. We do not upload the script since we think the mmseg is highly customized for users.
Most of your understanding is right. The transformation convert_datasets/potsdam.py
exists in the original mmseg, we didn't even use this folder and directly upload them.
The mIOUs that are shown in the mmseg site include all categories except the "Undefined". However, in RS literatures, the "Cluster" is also considered as background and does not take part in the metric calculation. In fact, whether or not to mask this category in training is both OK. For convenience, we also ignore it when training models.
@DotWang Thank you very much for your help and detailed and patient explanation, I finally achieved the results in the paper and can focus on doing my own research. Wish you all the best with your research. Thank your !
Hi, thanks for your great work and codebase.
The batch size is 8 in the paper, and 4 in the config of Swin-T-IMP+UperNet. And I did not find any description of num_gpu for the semantic seg. subsection. In the README.md of semantic seg., the command:
which seems to set num_gpus_per_node as 1? or your command is for 2 single GPU nodes and batch_size 4 for each (2x4)?