KU-CVLAB / CAT-Seg

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
https://ku-cvlab.github.io/CAT-Seg/
MIT License
247 stars 25 forks source link

Support for datasets with more than 255 classes #34

Closed rizzoligiulia closed 1 month ago

rizzoligiulia commented 1 month ago

When using the CatSeg model with a dataset that has more than 255 classes, the current implementation appears to be cutting off all classes above 255 and setting them to the 255th class. This is likely due to limitations in the underlying Detectron2 library, which CatSeg is built upon.

hsshin98 commented 1 month ago

Hi, this seems like it's happening from one of our options here: https://github.com/KU-CVLAB/CAT-Seg/blob/6d3a188af95165147fe2f34a8237fa7d2633e784/cat_seg/modeling/transformer/model.py#L596 The pad_len is set to 256 by default and thresholds top 256 classes, which saves memory and time during inference. If you want to run the model with all of the classes, you can set pad_len = 0, which disables this feature.

rizzoligiulia commented 1 month ago

Hello, thank you for your answer.

The error I am facing regards the ground truth itself, i.e., the "sem_seg" in each batched_inputs. Even for A-847 and P-459, it seems to be an issue.

hsshin98 commented 3 weeks ago

Hi, we've check on our environment and this issue doesn't occur for our setting. We've had reports where the detectron2 library was the problem, so re-installing detectron might help. The dataset preprocessing might also be of an issue, so checking if you have properly followed the dataset preparations, since the prepare_dataset.py files process the annotation files.

rizzoligiulia commented 3 weeks ago

The thing is:

  1. I can reproduce the results from your paper without problems.
  2. By opening the annotations with PIL, I do not observe any problem and the labels are mapped with reasonable values (i.e., no issue encountered while using prepare_.py).
  3. By inspecting the "sem_seg" in the batched_inputs (inside cat_seg_model.py) for A-847 and P-459 I get only values <= 255.

What would be the correct detectron2 version to use?

hsshin98 commented 2 weeks ago

Hi,

We've gave a deeper look into detectron2 and the problem seems to be in the default dataloader. This is what we've figured out so far:

  1. This does not happen during training, as it uses a custom mapper function during training(cat_seg/data/dataset_mappers/mask_former_semantic_dataset_mapper.py)
  2. The evaluation also does not take effect because it uses a custom load function in its evaluator(https://github.com/facebookresearch/detectron2/blob/5b72c27ae39f99db75d43f18fd1312e1ea934e60/detectron2/evaluation/sem_seg_evaluation.py#L27)
  3. The problem only happens during inference, only within the "sem_seg" tensor in the forward loop. However, this is never used.

We're not sure what you want to do with the GT label during inference, but this also seems to happen with the newest version, so the only solution seems to be modifying the mapper inside the test dataloader instead of the default mapper. The problematic part is https://github.com/facebookresearch/detectron2/blob/5b72c27ae39f99db75d43f18fd1312e1ea934e60/detectron2/data/dataset_mapper.py#L159

The "L" option in read_image converts it into uint8, which cuts off all indexes above 255. Changing this line similar to what we have in the training loader like https://github.com/KU-CVLAB/CAT-Seg/blob/6d3a188af95165147fe2f34a8237fa7d2633e784/cat_seg/data/dataset_mappers/mask_former_semantic_dataset_mapper.py#L114 should do the trick.

Let me know if this helped!