Weird behaviour with instance segmentation

m-klasen commented 4 years ago

Hello, I'm trying to use Detr for instance segmentation and trained following the guidelines:

Train Box-Model 2. train with --masks without any coco_panoptic flags. The training curve for my segm mAP looks good, however, i noticed that the mask spans all instances for every valid prediction instead of the expected one mask per instance. Boxes work fine though.

out = model(img)
probas = out['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.5
ncols = 2
fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10))
for line in axs:
for a in line:
    a.axis('off')
for i, mask in enumerate(out["pred_masks"][keep]):
ax = axs[i // ncols, i % ncols]
ax.imshow(mask, cmap="cividis")
ax.axis('off')

Target masks in each image are separated (e.g. 4 individual masks for each instance) in train/valid dataloader

Not a (big) issue using the out-of-the-box panoptic model (although obviously poor results due to lack of fine tuning):

model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250)

So what could be the cause of this problem?

alcinos commented 4 years ago

Hi @mlk1337

Could you clarify what you mean by "The training curve for my segm mAP looks good" ? Is it close to the box AP?

I'm not too sure what's happening here, my best bet would be that your ground truth masks are somehow merged together. I'd recommend as a first step to visualize them to be 100% sure they are as you expect them to be. If you need visualization functions, you can use either Detectron2 (similar to the notebook) or, for example, https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/visualize.py

Best of luck.

m-klasen commented 4 years ago

My Box mAP is 0.65 and my segm mAP is 0.45 (although many frames do not features multiple instances, in which the issue will not manifest itself). I visualized my ground truths inside my train loop right before loss calculation

        outputs = model(samples)
        #Batchsize=1
        for i,mask in enumerate(targets[0]["masks"].cpu()):
            fig, axs = plt.subplots(ncols=1, nrows=1)
            axs.imshow(mask.numpy(), cmap="cividis")
            plt.savefig(f'test_gts/{int(targets[0]["image_id"])}-{i}.jpg')
        loss_dict = criterion(outputs, targets)

Maybe this is an overfit issue of a small dataset or improper training. Will investigate this further.

fmassa commented 4 years ago

Very interesting results!

This could potentially be overfitting to the fact that you have only mostly one instance per image. I wonder if data augmentation techniques like montages / mozaics could help if this is the case?

alcinos commented 4 years ago

One other possibility would be to try to finetune from the panoptic model (including the segmentation head). Is it what you are doing?

m-klasen commented 4 years ago

Good ideas. I will try 1. a stronger regularization approach and 2. finetune from the panoptic pretrained weights, there might be some more significant diff between the backbone&transformer due to usage of intermediate ResNet features with masks.

m-klasen commented 4 years ago

Training both bbox&segm straight on the pretrained panoptic weights with only class weights removed and nothing frozen was the solution. Got box mAP 0.65 & segm mAP 0.60 right now. Thank you for your help.

fmassa commented 4 years ago

Great! Glad to know that this is working now!

Given that the issue seems to have been resolved, I'm closing the issue, but let us know if you have further questions

alcinos commented 4 years ago

Wow those masks look great indeed! Glad it worked for you :)

gfiameni commented 3 years ago

Hello, I'm trying to use Detr for instance segmentation and trained following the guidelines:

1. Train Box-Model 2. train with `--masks` without any coco_panoptic flags.
   The training curve for my segm mAP looks good, however, i noticed that the mask spans all instances for every valid prediction instead of the expected one mask per instance. Boxes work fine though.

out = model(img)
probas = out['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.5
ncols = 2
fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10))
for line in axs:
    for a in line:
        a.axis('off')
for i, mask in enumerate(out["pred_masks"][keep]):
    ax = axs[i // ncols, i % ncols]
    ax.imshow(mask, cmap="cividis")
    ax.axis('off')

1. Target masks in each image are separated (e.g. 4 individual masks for each instance) in train/valid dataloader

2. Not a (big) issue using the out-of-the-box panoptic model (although obviously poor results due to lack of fine tuning):

model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250)

Hi @m-klasen, may I ask how you trained the model for segmentation? Have you initially trained (end-2-end or pre-trained?) the box model and then the segmentation head? And if so, how do you create the model for inference? Since my fine-tuned box model doesn't include masks I have to load the **detr_resnet101_panoptic** one but cannot take the pretrained weights (i.e. pretrained = **False**) as the weights and biases size are different from mine.

I am doing this:

model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained = **False**, return_postprocessor = True, num_classes = 15)

checkpoint = torch.load('/workspace/detr/output/my_segmentation_head/checkpoint.pth', map_location='cpu')

model.load_state_dict(checkpoint['model'], strict=False)

model.eval();

Thanks for any input.