MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link

train and test on my own dataset #49

Closed ChunmengLiu1 closed 4 months ago

ChunmengLiu1 commented 7 months ago

Hi! I had some problems when I changed the datasets. My datasets have 4 foreground classes and 1 background class. I followed other issues and registered in ./san/data/datasets/register.py and __init__.py. I want to compute the mIoU in both foreground and background.

  1. I set CLASS_NAMES=(background, ...) in register.py (including background). And I set MODEL.SAN.NUM_CLASSES 5. I didn't change mask_cls=F.softmax(mask_cls, dim=-1)[..., :-1]. Do you think the mIoU I calculated this way is reasonable?

  2. I don't understand why the output of mask_cls=F.softmax(mask_cls, dim=-1) is relevant to ..X..X..6 shape (6 classes). And you delete the last dimension [..., :-1]. Is it something to do with 255?

  3. While I set my dataset, I saw the RGB image root and semantic segmentation ground truth root. But where is your class label root? Does the label use image-level labels or semantic segmentation ground truths? Thanks!

MendelXu commented 6 months ago
  1. I think it is reasonable. But a potential problem is the name "background" is sent to the CLIP text encoder and converted to fixed class embedding.
  2. Yes. The last channel is subject to ignored label (area should be ignored following the dataset definition or area not matched to any object).
  3. The class label is definied by the segmentation ground truth and the registry you define. The segmentation ground truth is image with class ids, and each id is mapped to a category name by the registry. For example, if you define the class names as a ordered list, then 0 in the ground truth is mapped to the first item in the class names.