facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.43k stars 2.42k forks source link

Weird behaviour with instance segmentation #132

Closed m-klasen closed 4 years ago

m-klasen commented 4 years ago

Hello, I'm trying to use Detr for instance segmentation and trained following the guidelines:

  1. Train Box-Model 2. train with --masks without any coco_panoptic flags. The training curve for my segm mAP looks good, however, i noticed that the mask spans all instances for every valid prediction instead of the expected one mask per instance. Boxes work fine though.

    out = model(img)
    probas = out['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > 0.5
    ncols = 2
    fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10))
    for line in axs:
    for a in line:
        a.axis('off')
    for i, mask in enumerate(out["pred_masks"][keep]):
    ax = axs[i // ncols, i % ncols]
    ax.imshow(mask, cmap="cividis")
    ax.axis('off')
  2. Target masks in each image are separated (e.g. 4 individual masks for each instance) in train/valid dataloader

  3. Not a (big) issue using the out-of-the-box panoptic model (although obviously poor results due to lack of fine tuning):

    model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250)

So what could be the cause of this problem?

alcinos commented 4 years ago

Hi @mlk1337

Could you clarify what you mean by "The training curve for my segm mAP looks good" ? Is it close to the box AP?

I'm not too sure what's happening here, my best bet would be that your ground truth masks are somehow merged together. I'd recommend as a first step to visualize them to be 100% sure they are as you expect them to be. If you need visualization functions, you can use either Detectron2 (similar to the notebook) or, for example, https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/visualize.py

Best of luck.

m-klasen commented 4 years ago

My Box mAP is 0.65 and my segm mAP is 0.45 (although many frames do not features multiple instances, in which the issue will not manifest itself). I visualized my ground truths inside my train loop right before loss calculation

        outputs = model(samples)
        #Batchsize=1
        for i,mask in enumerate(targets[0]["masks"].cpu()):
            fig, axs = plt.subplots(ncols=1, nrows=1)
            axs.imshow(mask.numpy(), cmap="cividis")
            plt.savefig(f'test_gts/{int(targets[0]["image_id"])}-{i}.jpg')
        loss_dict = criterion(outputs, targets)

Maybe this is an overfit issue of a small dataset or improper training. Will investigate this further.

fmassa commented 4 years ago

Very interesting results!

This could potentially be overfitting to the fact that you have only mostly one instance per image. I wonder if data augmentation techniques like montages / mozaics could help if this is the case?

alcinos commented 4 years ago

One other possibility would be to try to finetune from the panoptic model (including the segmentation head). Is it what you are doing?

m-klasen commented 4 years ago

Good ideas. I will try 1. a stronger regularization approach and 2. finetune from the panoptic pretrained weights, there might be some more significant diff between the backbone&transformer due to usage of intermediate ResNet features with masks.

m-klasen commented 4 years ago

Training both bbox&segm straight on the pretrained panoptic weights with only class weights removed and nothing frozen was the solution. Got box mAP 0.65 & segm mAP 0.60 right now. Thank you for your help. image

fmassa commented 4 years ago

Great! Glad to know that this is working now!

Given that the issue seems to have been resolved, I'm closing the issue, but let us know if you have further questions

alcinos commented 4 years ago

Wow those masks look great indeed! Glad it worked for you :)

gfiameni commented 3 years ago

Hello, I'm trying to use Detr for instance segmentation and trained following the guidelines:

1. Train Box-Model 2. train with `--masks` without any coco_panoptic flags.
   The training curve for my segm mAP looks good, however, i noticed that the mask spans all instances for every valid prediction instead of the expected one mask per instance. Boxes work fine though.
out = model(img)
probas = out['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.5
ncols = 2
fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10))
for line in axs:
    for a in line:
        a.axis('off')
for i, mask in enumerate(out["pred_masks"][keep]):
    ax = axs[i // ncols, i % ncols]
    ax.imshow(mask, cmap="cividis")
    ax.axis('off')
1. Target masks in each image are separated (e.g. 4 individual masks for each instance) in train/valid dataloader

2. Not a (big) issue using the out-of-the-box panoptic model (although obviously poor results due to lack of fine tuning):
model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250)

Hi @m-klasen, may I ask how you trained the model for segmentation? Have you initially trained (end-2-end or pre-trained?) the box model and then the segmentation head? And if so, how do you create the model for inference? Since my fine-tuned box model doesn't include masks I have to load the **detr_resnet101_panoptic** one but cannot take the pretrained weights (i.e. pretrained = **False**) as the weights and biases size are different from mine.

I am doing this:

model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained = **False**, return_postprocessor = True, num_classes = 15)

checkpoint = torch.load('/workspace/detr/output/my_segmentation_head/checkpoint.pth', map_location='cpu')

model.load_state_dict(checkpoint['model'], strict=False)

model.eval();  

Thanks for any input.

Dicko87 commented 3 years ago

Hey @gfiameni how are you? I am wondering if you have made any progress with this, as this is something I want to look into also.

m-klasen commented 3 years ago

Hello, my basic methodology to train instance segmentation on a new dataset is (assuming that we do not change the 100 transformer queries parameter): (1) Use a pretrained object detection model from the Model ZOO to prepare a model_weights file that includes all pretrained weights except for the fully connected class layer (or just reseume a model zoo one an deal with the class layer shape mismatch in the main.py file) (2) Use the weights file from (1) via --resume to end to end train object detection until mAP convergence (freezing the Backbone and or transformer here could help speed things up here, since in my opinions their pretrained weights are already good) (3) Train the instance segmentation using the finetuned weights from (2). Use --masks and --frozen_weights=$ckpt, so that only the mask header will get trained.

Dicko87 commented 3 years ago

Hey @m-klasen, thank you very much for you prompt reply. I originally trained a custom detr model on my own dataset and it did very well. I was thinking I could use my weights from that model. I was just wondering how to go about resuming that model for segmentation.

I am currently looking for software to make my segmentation masks (png files) and once I have those I am wondering how I adapt the DETR code / files to train for segmentation rather than object detection. For object detection I used the command python main.py --dataset_file mycoco --coco_path /home/detr/datasets/mycoco --epochs 300 --lr=1e-5 --batch_size=4 --num_workers=4 --weight_decay=0.001 --output_dir="outputs" --resume="detr-r50_ready_to_train.pth" And was wondering what modification need to be made to tell it to train for segmentation, I am assuming I would have to use the coco_panoptic.py file rather than coco.py

Any help would be much appreciated, thank you :)

m-klasen commented 3 years ago

You need to follow Step (3) use --masks and replace --resume with --frozen_weights=$your_object_detection_model

Dicko87 commented 3 years ago

Thanks @m-klasen , so I need to: 1) Create my segmentation png masks. 2) Put them in an appropriate folder 3) Direct the model to the masks folder etc 4) Use your step 3 above ? use --masks and replace --resume with --frozen_weights=$your_object_detection_model

m-klasen commented 3 years ago

The Segmentation mask ground truths must be in the coco format within for example your train.json and valid.json (https://cocodataset.org/#format-data)

Dicko87 commented 3 years ago

Thanks @m-klasen , I have written a script to convert my previous object detection json files into COCO 2017 segmentation format. I found it strange that the json file still only contains coordinates of the bounding boxes around the objects, as opposed to polygon points. I guess the png images are enough for the model to train with and don't need the polygon points.

Dicko87 commented 3 years ago

Hi guys, I've ran into a problem with using segmentation, any help would be appreciated. thank you. line 78, in rgb2id return int(color[0] + 256 * color[1] + 256 * 256 * color[2]) TypeError: only size-1 arrays can be converted to Python scalars

I have my panoptic_coco (custom) dataset where each image contains multiple masks. The json file has one annotation per image, describing the bounding box for each mask as shown in the detr examples. I'm not sure what is wrong.

I think it is something to do with: https://cocodataset.org/#format-data, Panoptic Segmentation, item 2 which says: For each annotation, per-pixel segment ids are stored as a single PNG at annotation.file_name. The PNGs are in a folder with the same name as the JSON, i.e., annotations/name/ for annotations/name.json. Each segment (whether it's a stuff or thing segment) is assigned a unique id. Unlabeled pixels (void) are assigned a value of 0. Note that when you load the PNG as an RGB image, you will need to compute the ids via ids=R+G256+B256^2.

gfiameni commented 3 years ago

Hi @m-klasen

(1) Use a pretrained object detection model from the Model ZOO to prepare a model_weights file that includes all pretrained weights except for the fully connected class layer (or just resume a model zoo one an deal with the class layer shape mismatch in the main.py file)

what do you exactly mean with point #1?

I delete class embed weights and biases before fine tuning.

checkpoint = torch.hub.load_state_dict_from_url(
            url='https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth',
            map_location='cpu',
            check_hash=True)

del checkpoint["model"]["class_embed.weight"]
del checkpoint["model"]["class_embed.bias"]

The output of this operation becomes the --resume input.

MLDeS commented 1 year ago

Hi @m-klasen, I ran into a similar issue as your original question in this thread. I am trying to predict instance masks and end up in "masks that spans all instances for every valid prediction, instead of the expected one mask per instance."

I have followed this thread and am a bit lost in the discussion. I understood that you started trained with frozen weights. But what I wanted to understand is the possible cause for this.