VCIP-RGBD / DFormer

[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
https://yinbow.github.io/Projects/DFormer/index.html
MIT License
142 stars 24 forks source link

Doubt about num_classes for NYU Depth v2 Dataset #3

Closed ishmamahmed closed 8 months ago

ishmamahmed commented 10 months ago

Hello, thank you for sharing the code of this great work!!

I have a query about the NYU dataset.

The NYU dataset has 40 classes. So, in the final output layer of all of your decoders, num_classes is set to 40.

When I check the pixel values of the ground truths (0.png, 1.png, etc) inside the NYUDepthv2/Label/ directory, the pixel values range from 0 to 40. This indicates there are 41 classes.

import cv2
import numpy as np

gt_path = "...../NYUDepthv2_DFormer/Label"

for idx in range(10):
    label = f"{gt_path }/{idx}.png"
    unique_values = np.unique(cv2.imread(gt_path ))
    print(f"Classes in label {idx}: {unique_values}")

Classes in label 0: [ 0 1 2 3 5 7 12 22 24 26 38 39 40] Classes in label 1: [ 0 1 3 12 22 24 26 34 38 39 40] Classes in label 2: [ 0 1 5 7 8 26 29 38 40] Classes in label 3: [ 0 1 3 5 14 26 40] Classes in label 4: [ 0 1 3 5 7 8 12 15 22 26 30 34 38 39 40] Classes in label 5: [ 0 1 2 5 7 22 29 38 39 40] Classes in label 6: [ 0 1 2 5 15 22 38 39 40] Classes in label 7: [ 0 1 2 5 8 9 26 37 38 39 40] Classes in label 8: [ 0 1 2 5 7 8 11 15 22 26 29 38 39 40] Classes in label 9: [ 0 1 2 3 8 15 22 26 38 39 40]

So can you kindly tell me how have you dealt with the extra class in the ground truth labels? In my code, this discrepency causes error in the cross-entropy loss function.

Thank you.

yinbow commented 10 months ago

Hi, thanks for your attention to our work. The value '0' represents the background class, which is not calculated for the mIoU and loss. To be specific, this line process the input labels. In NYU dataset, the background class tends to be the borders between the objects and the edge of the images, as shown in below (black is the background class): temp

ishmamahmed commented 10 months ago

Thank you very much for your clarification!!

After subtracting 1 from the ground truth, the background (which was previously 0) becomes -1.

And I see you have ignored -1 in the cross-entropy loss using ignore_index:

image
ishmamahmed commented 8 months ago

Hello, I have another query. In the CrossEntropyLoss criterion, you are ignoring the config.background index. In this code you have set C.background = 255. But isn't the background=-1, since you have subtracted 1 from the ground truth here?

yinbow commented 8 months ago

The data format of the label is uint8 (0~255), so '0 - 1' is 255.