Fourier7754 / AsymFormer

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation
Apache License 2.0
27 stars 2 forks source link

Label the problem #10

Open yinyong14 opened 1 month ago

yinyong14 commented 1 month ago

Should I label all the semantic information in the image? I've noticed that there is ignore_index=-1 in the train.py file, and I only mark the part of my image that I need to split, and I use the rest as the background, but when eval.py runs, there will be a lot of mis-segmentation of the background, but the main part is not divided, and I can divide the rest of the background labels after retraining, what is the reason for this.

Fourier7754 commented 1 month ago

I suspect that you intend to train AsymFormer on your custom dataset, which involves addressing the issue of data annotation. First, we'll explain why the background class should be ignored in the NYUv2 dataset, and then we'll discuss how to train on your custom dataset. In the NYUv2 dataset, the background class contains numerous unlabeled objects. Including the background class as a label during training could mislead the network's understanding of other categories. Therefore, the standard practice is to ignore the background class during both training and testing phases to accurately assess the network's performance. However, if your task is to differentiate between the foreground and background, then the scenario might be different. In such a case, you may not need to ignore your dataset's background class during the training and testing phases.

In actual practice, firstly, during the data annotation phase, you need to mark the background class as class 0, and eliminate the “target_scales[0] - 1” operation when computing the loss. Similarly, during testing, remove the "output = torch.max(pred, 1)[1] + 1" operation: output = torch.max(pred, 1)[1] + 1

yinyong14 commented 1 month ago

Thanks for the reply, but sorry, I don't understand the remove the "output = torch.max(pred, 1)[1] + 1" operation: output = torch.max(pred, 1)[1] + 1, these are not two identical codes.

Fourier7754 commented 1 month ago

In the NYUv2 dataset, class 0 represents the background class. During the training phase, we subtract 1 from all labels and ignore the label "-1", which effectively removes the background class from the training process. During the testing phase, we add 1 to all labels to ensure the output results align with the categories in the NYUv2 label image (the label 0 in "pred" is corresponding to label 1 in original NYUv2 label image). When training and testing on your own dataset, you might not need to apply these preprocessing and postprocessing steps.

yinyong14 commented 1 month ago

Thanks for the reply, I understand.