Output mask is almost the same in all high confidence queries

Instructions To Reproduce the Issue:

what changes you made (git diff) or what code you wrote

I am fine tuning the coco detr checkpoint using my own dataset of 24 classes (the number of classes was modified in the script). The dataset is in coco format. Single image example:

 [{"image_id": 1, "file_name": "frame000105.png", "segments_info":  [{"id": 1, "category_id": 1, "area": 10948.0, "bbox": [425.0, 50.0, 171.0, 88.0], "iscrowd": 0}, {"id": 2, "category_id": 1, "area": 709.0, "bbox": [213.0, 74.0, 38.0, 25.0], "iscrowd": 0}, {"id": 3, "category_id": 1, "area": 576.0, "bbox": [47.0, 69.0, 44.0, 20.0], "iscrowd": 0}, {"id": 4, "category_id": 3, "area": 613.0, "bbox": [300.0, 98.0, 57.0, 23.0], "iscrowd": 0}, {"id": 5, "category_id": 15, "area": 159.0, "bbox": [314.0, 7.0, 11.0, 24.0], "iscrowd": 0}, {"id": 6, "category_id": 16, "area": 69.0, "bbox": [344.0, 115.0, 14.0, 7.0], "iscrowd": 0}, {"id": 7, "category_id": 16, "area": 88.0, "bbox": [303.0, 110.0, 26.0, 5.0], "iscrowd": 0}, {"id": 8, "category_id": 16, "area": 15620.0, "bbox": [279.0, 200.0, 320.0, 70.0], "iscrowd": 0}, {"id": 9, "category_id": 1, "area": 1037.0, "bbox": [0.0, 51.0, 40.0, 34.0], "iscrowd": 0}, {"id": 10, "category_id": 1, "area": 129.0, "bbox": [630.0, 118.0, 10.0, 20.0], "iscrowd": 0}, {"id": 11, "category_id": 1, "area": 239.0, "bbox": [109.0, 72.0, 17.0, 17.0], "iscrowd": 0}, {"id": 12, "category_id": 2, "area": 1360.0, "bbox": [345.0, 40.0, 30.0, 81.0], "iscrowd": 0}, {"id": 13, "category_id": 19, "area": 14462.0, "bbox": [206.0, 112.0, 274.0, 75.0], "iscrowd": 0}, {"id": 14, "category_id": 3, "area": 503.0, "bbox": [371.0, 110.0, 66.0, 13.0], "iscrowd": 0}, {"id": 15, "category_id": 3, "area": 100393.0, "bbox": [0.0, 187.0, 640.0, 213.0], "iscrowd": 0}, {"id": 16, "category_id": 16, "area": 56.0, "bbox": [370.0, 116.0, 10.0, 6.0], "iscrowd": 0}, {"id": 17, "category_id": 19, "area": 1718.0, "bbox": [49.0, 133.0, 88.0, 35.0], "iscrowd": 0}, {"id": 18, "category_id": 2, "area": 17095.0, "bbox": [93.0, 0.0, 133.0, 230.0], "iscrowd": 0}]}

what exact command you run: For fine tuning the bounding boxes on modified checkpoint (without class weights) ran the following

python3 main.py --dataset_file coco_panoptic --coco_path panoptic/ --coco_panoptic_path panoptic/ --epochs 300 --lr=1e-4 --batch_size=8 --num_workers=4 --output_dir="outputs" --resume="detr-r50_no-class-head.pth"

For fine tuning segmentation

 python3 main.py --masks --dataset_file coco_panoptic --coco_path panoptic/ --coco_panoptic_path panoptic/ --epochs 25 --lr=1e-4 --lr_drop 15 --batch_size=4 --num_workers=4 --output_dir="segm" --frozen_weights outputs/checkpoint.pth

what you observed (including full logs):

detr

Expected behavior:

Although the mask is accurate as a whole there is no instance separation.

facebookresearch / detr

Output mask is almost the same in all high confidence queries #483

Instructions To Reproduce the Issue:

Expected behavior: