Support instance mask annotation with mask.png

txytju commented 5 years ago

🚀 Feature

Instances mask image can be used as the ground_truth label. For example, in the PNG file, every instance is labeled using a unique color.

Motivation

Currently, annotations for instances is COCO-style, in which instance mask is annotated by polygons. However, if instance mask has holes, the polygon annotation fails. But if we use a binary instance mask PNG, it can handle holes in the instance masks.

fmassa commented 5 years ago

There is an open PR that adds support for binary masks in https://github.com/facebookresearch/maskrcnn-benchmark/pull/150

You can try it out to see if it works for your use-case. I still haven't had the time to pull it down and try it out for myself though, that's why I haven't merged the PR yet.

txytju commented 5 years ago

OK, I will try it today and if it works I will report it here. The binary mask is a better form in my opinion rather than polygons and I think it should be the default form of instance mask.

txytju commented 5 years ago

I have tested the code and have written a corresponding dataloader for image-mask input data. This dataloader is modified from COCODataset. Would you like to help and check it? especially the corresponding relationship between image information(like size) and image. What's more, generating instance masks online(during training) is quite slow(about 10 times slower than polygon), and after I check the logic of dataloader is right, I will make instance mask generation offline. Thank you in advance. If the codes work, maybe #150 should be merged to master branch.

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import os
import numpy as np
import torch
import torchvision
from PIL import Image

from maskrcnn_benchmark.structures.bounding_box import BoxList
from maskrcnn_benchmark.structures.segmentation_mask import SegmentationMask

class COCODatasetBinaryMask(torchvision.datasets.coco.CocoDetection):

    def __init__(
        self, ann_file, root, transforms=None
    ):
        super(COCODatasetBinaryMask, self).__init__(root, ann_file)

        # sort indices for reproducible results
        self.ids = sorted(self.ids)

        self.json_category_id_to_contiguous_id = {
            v: i + 1 for i, v in enumerate(self.coco.getCatIds())
        }
        self.contiguous_category_id_to_json_id = {
            v: k for k, v in self.json_category_id_to_contiguous_id.items()
        }
        self.id_to_img_map = {k: v for k, v in enumerate(self.ids)}
        self.transforms = transforms

        self.root = root
        self.image_root = self.root + "images/"
        self.mask_root = self.root + "masks/"

        image_names = [image_name.split(".")[0] for image_name in os.listdir(self.image_root) if ".jpg" in image_name]
        mask_names = [mask_name.replace("_mask","").split(".")[0] for mask_name in os.listdir(self.mask_root) if ".png" in mask_name]
        self.names = list(set(image_names) & set(mask_names))

    def __getitem__(self, idx):

        name = self.names[idx]
        image_path = self.image_root + name + ".jpg"
        mask_path = self.mask_root + name + "_mask.png"

        img = Image.open(image_path)
        mask = np.array(Image.open(mask_path))

        boxes, masks = self._get_insts_bbox_mask_from_mask(mask, third_object_color="red")

        # boxes : a list of list [[x,y,w,h],[x,y,w,h],[...],[...],]
        boxes = torch.as_tensor(boxes).reshape(-1, 4)  # guard against no boxes
        target = BoxList(boxes, img.size, mode="xywh").convert("xyxy")

        classes = [1] * len(boxes) # only one class in my dataset
        classes = torch.tensor(classes)
        target.add_field("labels", classes)

        # masks : list of numpy array
        masks = SegmentationMask(masks, img.size)
        target.add_field("masks", masks)

        target = target.clip_to_image(remove_empty=True)

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target, idx

    def get_img_info(self, index):
        img_id = self.id_to_img_map[index]
        img_data = self.coco.imgs[img_id]
        print(img_data)
        return img_data

    def _get_insts_bbox_mask_from_mask(self, mask, third_object_color="red"):
        colors = np.unique(mask.reshape(-1, mask.shape[2]), axis=0)
        colors = [list(color) for color in colors]
        if third_object_color=="red":
            abandon_colors = [[0, 0, 0], [0, 0, 255]]
        elif third_object_color=="pink":
            abandon_colors = [[0, 0, 0], [237, 199, 244]] # pink as the 3rd object  
        inst_colors = [color for color in colors if color not in abandon_colors]

        boxes = []
        masks = []
        for i in range(len(inst_colors)):
            inst_mask = np.all(np.equal(mask, inst_colors[i]), axis=2)
            inst_mask = np.where(inst_mask==True, 1, 0)
            inst_mask = inst_mask.astype(np.uint8)

            # kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT,(7, 7))
            # inst_mask = cv2.morphologyEx(inst_mask, cv2.MORPH_CLOSE, kernel_open)
            # kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT,(7, 7))
            # inst_mask = cv2.morphologyEx(inst_mask, cv2.MORPH_CLOSE, kernel_close)

            box = self._bbox(inst_mask)
            box_area = self._area(box)
            if box_area >= 100 :
                y_min,y_max,x_min,x_max = box
                boxes.append([x_min, y_min, x_max-x_min, y_max-y_min])
                masks.append(inst_mask)

        return boxes, masks

    def _bbox(self, img):
        a = np.where(img != 0)
        bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
        return bbox # y_min,y_max,x_min,x_max

    def _area(self, box):
        return (box[1]-box[0]) * (box[3]-box[2])

fmassa commented 5 years ago

According to https://github.com/facebookresearch/maskrcnn-benchmark/pull/150/files#diff-928af5178eceaef7d662fe22c85f439aR209, because in your case masks is a list of numpy.ndarray, I'd imagine that you'd need to pass mode='mask' to the code in SegmentationMask for it to take the right path.

Here is one thing I'd do to verify that the code works as expected (without any transform in the dataset):

get one instance of the dataset (for example dataset[0])
from target, do a transformation like target.transpose(0) to flip horizontally
do img.transpose(0) (img is a PIL image) to flip the image horizontally
compare the mask in target.get_field('mask') with the image and verify that they are both flipped

txytju commented 5 years ago

I will try to verify that, thanks!

What's more, because segms[0] is a numpy array rather than list, so the mode is set to "mask" without setting it by yourself.

class SegmentationMask(object):
    """
    This class stores the segmentations for all objects in the image
    """

    def __init__(self, segms, size, mode=None):
        """
        Arguments:
            segms: three types
                (1) polygons: a list of list of lists of numbers. The first
                level of the list correspond to individual instances,
                the second level to all the polygons that compose the
                object, and the third level to the polygon coordinates.
                (2) rles: COCO's run length encoding format, uncompressed or compressed
                (3) binary masks
            size: (width, height)
            mode: 'polygon', 'mask'. if mode is 'mask', convert mask of any format to binary mask
        """
        assert isinstance(segms, list)
        if type(segms[0]) != list:
            mode = 'mask'

fmassa commented 5 years ago

Oh, I missed the segms[0], I just saw it as segms. Sounds good then!

txytju commented 5 years ago

I have verified transpose operation, it works.

fmassa commented 5 years ago

Cool!

Now if the rest of the training works out of the box with your dataset, then that is a good signal that we can be looking again into merging that PR

txytju commented 5 years ago

Yes. I will generate offline dataset(instance masks) and train on my dataset. After that, if both polygon annotation and binary mask annotation work, maybe we should consider merging that PR. I will report the training result within 24 hours.

fmassa commented 5 years ago

You can increase the number of worker threads in the dataloader, so that you don't need to generate it offline - it will probably be simpler

txytju commented 5 years ago

OK. I have implemented both online and offline method and using online method currently. I tried to overfit a large model on only 2 images, but the result is not that good. The predicted mask seems to have shifted a few pixels to the right compared with the ground_truth mask, as you can see in these images. I have no idea what's wrong here. raw_image : https://ws2.sinaimg.cn/large/006tNbRwly1fy3u4tpu0kj30u0190qs3.jpg inst_1 : https://ws4.sinaimg.cn/large/006tNbRwly1fy3u4wtp88j30u0190neq.jpg inst_2 : https://ws3.sinaimg.cn/large/006tNbRwly1fy3u4vk8qnj30u0190qkh.jpg

fmassa commented 5 years ago

This might indicate that there are still a few problems with the current implementation in the Mask class. One thing I'd do: try transposing the masks twice. They should give the original result. If that's not the case, then the transposing is introducing some +1 shifts somewhere that should be fixed.

txytju commented 5 years ago

Thanks, I will try that! By the way, will this be caused by the inconsistent between Mask and loss calculations in the main project or something? Or we can make sure that if Mask class is perfectly implemented, it would work with the whole project perfectly?

fmassa commented 5 years ago

If the Mask class is perfectly implemented, then the rest of the codebase shouldn't be affected and it should work nicely.

txytju commented 5 years ago

I tried to transpose the BoxList twice and it turns out it gives the original result.

from maskrcnn_benchmark.data.datasets.coco_binary_mask import COCODatasetBinaryMaskOnLine
ann_file = "path/to/data_binary_mask.json"
root = "path/to/dataset"
coco_binary_mask = COCODatasetBinaryMaskOnLine(ann_file, root, transforms=None)

img, target, _ = coco_binary_mask[1]  # index=1 for example
masks = target.get_field('masks').masks

f_f_target = target.transpose(0).transpose(0)
f_f_masks = f_f_target.get_field('masks').masks

# fliped twice masks
mask_1 = f_f_masks[0].mask.numpy()
mask_2 = f_f_masks[1].mask.numpy()
# original masks
mask_3 = masks[0].mask.numpy()
mask_4 = masks[1].mask.numpy()

print(np.all(np.equal(mask_1,mask_3)))
print(np.all(np.equal(mask_1,mask_4)))
print(np.all(np.equal(mask_2,mask_3)))
print(np.all(np.equal(mask_2,mask_4)))

# result is two `True` and two `False`, which means that the fliped masks are equal to original ones.

What's more, I tried to turn off data agumentation but the trained model still predict a shifted mask.

def build_transforms(cfg, is_train=True):
    if is_train:
        min_size = cfg.INPUT.MIN_SIZE_TRAIN
        max_size = cfg.INPUT.MAX_SIZE_TRAIN
        flip_prob = 0.5  # cfg.INPUT.FLIP_PROB_TRAIN
    else:
        min_size = cfg.INPUT.MIN_SIZE_TEST
        max_size = cfg.INPUT.MAX_SIZE_TEST
        flip_prob = 0

    to_bgr255 = cfg.INPUT.TO_BGR255
    normalize_transform = T.Normalize(
        mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=to_bgr255
    )

    transform = T.Compose(
        [
            T.Resize(min_size, max_size),
            T.ToTensor(),
            normalize_transform,
        ]
    )
    return transform

fmassa commented 5 years ago

@txytju One more thing to verify is that the crop matches exactly the results in here. Apart from that, I don't see any other cases where it should be a problem, but maybe the crop is missing a +1 or -1 offset somewhere?

txytju commented 5 years ago

@fmassa The only difference that I can see is [int(b) for b in box] in Mak.crop() but not in Polygon.crop(). Maybe the shift is caused by int() ? However, since in Mask class, we are operating on image mask in which index must be of type int, right? So I have no idea how to solve this bug and I have been stuck here for a few days....

fmassa commented 5 years ago

So, one thing to also take into account is that the transforms rescale the image during training so that they have a particular size. This downsampling (.resize() could potentially be introducing a shift). Apart from that, I do not have any more ideas for now. Maybe @wangg12 knows a bit more, given that he has implemented (and potentially used) the Mask functions.

wangg12 commented 5 years ago

I think the +1 offset might explain the issue. In mmdetection, they use w = max(x2 - x1 +1) everywhere consistently, while it is not consistent in this implementation (maybe the historical reason from the implementation of Detectron?). @fmassa Have you tried with the +1 everywhere version and how does it perform?

fmassa commented 5 years ago

The current implementation of Polygons here follow the implementation of Detectron, which is a legacy behavior which adds a 1 for computing the width of boxes. But that's something we kept for consistency with previous models.

txytju commented 5 years ago

@fmassa Shift problem in my dataset has been solved by just use scaled_mask = interpolate(self.mask[None, None, :, :], (height, width), mode='bilinear')[0, 0] rather than scaled_mask = interpolate(self.mask[None, None, :, :], (height, width), mode='nearest')[0, 0] in Mask.resize() method.

I don't know exactlly why hat happens...

fmassa commented 5 years ago

I think nearest interpolation might bring those artifacts, but great to know that this was the solution for your case, it's very helpful!

txytju commented 5 years ago

I trained my model using a small dataset that contains 2 images and it succeeds. But when I try to train on a larger dataset, a out of memory cames.

dataloader done!
2018-12-18 19:05:07,871 maskrcnn_benchmark.trainer INFO: Start training
2018-12-18 19:07:01,964 maskrcnn_benchmark.trainer INFO: eta: 2 days, 15:21:06  iter: 20  loss: 1.7994 (2.8083)  loss_classifier: 0.1732 (0.4978)  loss_box_reg: 0.0491 (0.0551)  loss_mask: 1.2106 (2.0742)  loss_objectness: 0.0674 (0.1511)  loss_rpn_box_reg: 0.0117 (0.0301)  time: 4.5573 (5.7045)  data: 4.0758 (5.1670)  lr: 0.001076  max mem: 2071
Traceback (most recent call last):
  File "tools/train_net.py", line 206, in <module>
    main()
  File "tools/train_net.py", line 199, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 108, in train
    arguments,
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 71, in do_train
    loss_dict = model(images, targets)
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__
    labels, regression_targets = self.prepare_targets(anchors, targets)
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets
    anchors_per_image, targets_per_image
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 37, in match_targets_to_anchors
    match_quality_matrix = boxlist_iou(target, anchor)
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 79, in boxlist_iou
    lt = torch.max(box1[:, None, :2], box2[:, :2])  # [N,M,2]
RuntimeError: CUDA out of memory. Tried to allocate 1.63 GiB (GPU 0; 10.91 GiB total capacity; 1.65 GiB already allocated; 1.56 GiB free; 167.17 MiB cached)

I'm using one GPU and one image per batch.

By the way, I am using offline dataset.


class COCODatasetBinaryMaskOffLine(torchvision.datasets.coco.CocoDetection):
    '''
    Like COCO dataset which use binary mask as instance annotation rather than polygons.
    This class use instance masks offline(during training), which is fast.
    Check tools/datasets/data_generate_utils.py for annotation generation.
    '''

    def __init__(
        self, ann_file, root, transforms=None
    ):
        super(COCODatasetBinaryMaskOffLine, self).__init__(root, ann_file)

        # sort indices for reproducible results
        self.ids = sorted(self.ids)
        self.json_category_id_to_contiguous_id = {
            v: i + 1 for i, v in enumerate(self.coco.getCatIds())
        }
        self.contiguous_category_id_to_json_id = {
            v: k for k, v in self.json_category_id_to_contiguous_id.items()
        }
        self.id_to_img_map = {k: v for k, v in enumerate(self.ids)}
        self.transforms = transforms
        self.root = root
        self.image_root = self.root + "images/"
        self.mask_root = self.root + "masks/"
        self.inst_mask_root = self. root + "insts_masks/"

        image_names = [image_name.split(".")[0] for image_name in os.listdir(self.image_root) if ".jpg" in image_name]
        mask_names = [mask_name.replace("_mask","").split(".")[0] for mask_name in os.listdir(self.mask_root) if ".png" in mask_name]
        self.names = list(set(image_names) & set(mask_names))

    def __getitem__(self, idx):

        name = self.names[idx]
        image_path = self.image_root + name + ".jpg"
        img = Image.open(image_path)

        boxes, masks = self._get_insts_bbox_mask_offline(name)

        # boxes : a list of list [[x,y,w,h],[x,y,w,h],[...],[...],]
        boxes = torch.as_tensor(boxes).reshape(-1, 4)  # guard against no boxes
        target = BoxList(boxes, img.size, mode="xywh").convert("xyxy")
        classes = [1] * len(boxes)
        classes = torch.tensor(classes)
        target.add_field("labels", classes)
        masks = SegmentationMask(masks, img.size)  # masks : list of numpy array
        target.add_field("masks", masks)
        target = target.clip_to_image(remove_empty=True)
        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target, idx

    def get_img_info(self, index):
        img_id = self.id_to_img_map[index]
        img_data = self.coco.imgs[img_id]
        return img_data

    def _get_insts_bbox_mask_offline(self, image_name):
        boxes = []
        masks = []
        instance_mask_names = [name for name in os.listdir(self.inst_mask_root) if image_name in name]
        for instance_mask_name in instance_mask_names:
            instance_mask_path = self.inst_mask_root + instance_mask_name
            instance_mask = cv2.imread(instance_mask_path)
            instance_mask = instance_mask[:,:,0]
            instance_mask = np.where(instance_mask==255, 1, 0)
            masks.append(instance_mask)
            y_min,y_max,x_min,x_max = self._bbox(instance_mask)
            boxes.append([x_min, y_min, x_max-x_min, y_max-y_min])

        return boxes, masks 

    def _bbox(self, img):
        a = np.where(img != 0)
        bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
        return bbox # y_min,y_max,x_min,x_max

fmassa commented 5 years ago

The reason is that you probably have a lot of GT per image. I'd recommend moving the box_iou computation to happen on the CPU, as discussed in https://github.com/facebookresearch/maskrcnn-benchmark/issues/18

txytju commented 5 years ago

Thanks for that. One thing that I want to make sure is that OOM is not caused by loading all data before training in the Dataloader, right? Is it true that we just load batch images and targets when training?

I think there will not be more than 5 instances in per image and both gpu memory and cpu memory are high.

fmassa commented 5 years ago

The OOM is happening on the GPU, so it's probably not related to data loading I believe. The data loader usually acts on CPU data.

txytju commented 5 years ago

Update : I set NUM_WORKERS = 0 and this problem is gone.

I encountered OOM when loading data, as follows.

Traceback (most recent call last):
  File "tools/train_net.py", line 206, in <module>
    main()
  File "tools/train_net.py", line 199, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 108, in train
    arguments,
  File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
    for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    idx, batch = self._get_batch()
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
    return self.data_queue.get()
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 274, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 10507) is killed by signal: Killed.

txytju commented 5 years ago

I have carried a few experiments. When I use dataset of only a few images(like 2 or 5 images), training works well. However when I try to train on a large dataset, there is always an GPU OOM problem. So I think that maybe that's not because of too many instances in an image, but that is because I saved some data about the whole dataset in GPU memory which gives me GPU OOM when using large dataset. Could you please give me some hint where to debug? thanks in advance!

fmassa commented 5 years ago

Depending on how you stored the data in the dataset (for example a numpy array), each worker will copy the whole object to each new thread, making it require a lot of CPU memory. If those are torch tensors, you should be fine.

I'd recommend not loading the full dataset in memory, but instead load it at every getitem call

txytju commented 5 years ago

Thanks for your patience! It turns out that the instances in an image are too many, which is the result of bad image labeling. After removing those images and label, the training works well.

fmassa commented 5 years ago

Awesome, thanks!

So, to summarize, the only thing that you had to change in order for your training to work as expected (on top of the PR adding better mask support) is to change the interpolation mode to bilinear instead of nearest, is that right?

txytju commented 5 years ago

Sorry to be late. Yes, after the change of interpolation, it works for me. However, because when operating resize and crop operations, you can only use int type when using binary mask, rather than float type when using polygon. And if the mask resolution is small(like 28), the ground truth mask based on binary mask is not as good as polygons.

fmassa commented 5 years ago

@txytju sounds good, thanks for the information!

JoyHuYY1412 commented 5 years ago

Sorry to be late. Yes, after the change of interpolation, it works for me. However, because when operating resize and crop operations, you can only use int type when using binary mask, rather than float type when using polygon. And if the mask resolution is small(like 28), the ground truth mask based on binary mask is not as good as polygons.

Could you merge your code? I think your work is awesome. What's more, is there any schedule to support RLE format masks? There are some COCO datasets use RLE format. I think it would be nice since then we don't have to extract instances from PNG images.

fmassa commented 5 years ago

@JoyHuYY1412 I'm willing to merge the PR that adds support for it, I'd just ask for it to have unit tests so that we know we are computing the same things for polygons and masks

botcs commented 5 years ago

I could help with the unit tests, this PR would be useful for many I think

fmassa commented 5 years ago

@botcs yes, please!

botcs commented 5 years ago

So, could you please make a list of tests that should be done? I am now currently running my DIY script for GTA5->COCO polygon... and trust me I have time :D

another thing: if this module is merged, than can we use the evaluation tools, tools/test_net.py?

fmassa commented 5 years ago

Here are a few tests I think would be useful to have:

from the same polygon, performing a (resize / transpose / crop) on a polygon and on the converted mask should give the same (or almost) result as performing the operation in the polygon and then converting to mask. This will test that the transforms match up to a factir
test that the training a quick_schedule model using masks gives similar results as when using polygons. This doesn't need to be a unit test, but should be verified.

If support for masks is added, then it will be very simple to be able to use it either in tools/train_net.py or tools/test_net.py

Let me know if you have further questions!

botcs commented 5 years ago

Okay, I still have a few questions, just for clarification:

If the PR #150 has already a test called test_segmentation_mask.py with the mentioned tests implemented: resize / transpose / crop / convert why is it required to re-implement them?
I have pulled PR #150 and ran the test with the following failures, and as far as I can see, the test is implemented correctly, so I suppose the PR is not yet ready for merge.

../home/csbotos/anaconda3/envs/debugmask/lib/python3.7/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
diff resize:  tensor(210.)
F0 tensor(218.)
F
======================================================================
FAIL: test_resize (__main__.TestSegmentationMask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_segmentation_mask.py", line 51, in test_resize
    self.assertTrue(torch.equal(mask_from_poly_resize, mask_resize))
AssertionError: False is not true

======================================================================
FAIL: test_transpose (__main__.TestSegmentationMask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_segmentation_mask.py", line 44, in test_transpose
    self.assertTrue(torch.equal(mask_flip, mask_from_poly_flip))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 4 tests in 0.021s

FAILED (failures=2)

in @txytju 's comment there is a data loader: should it be part of the PR or not?

[Edit]: I will continue this thread at #150 and will get back to this when unit tests are done

wangg12 commented 5 years ago

@fmassa @botcs I think it is hard to make the behavior of binary mask and polygons exactly the same apart from the old detectron inconsistency.

PR #150 alone may be not enough for binary masks to work as well as polygons, since this codebase was optimized for polygons based input.

A possible good practice would be trying PR #150 and making some necessary modifications while using binary masks to make the coco performance as well as using polygons (e.g., by adding a global config flag to alter betwwen these two modes). You may refer to mmdetection for the necessary changes since it inherently utilizes binary masks.

fmassa commented 5 years ago

Thanks for your comments @wangg12 ! About the codebase being optimized for polygons, do you mean that the results were optimized using polygons? Because runtime-wise, it shouldn't be too different maybe? But it would use more memory, for sure.

@botcs I've commented in #150 , let me know what you think.

oguchi-ebube commented 4 years ago

hello, i am trying to do a bckground color based segmentation of a soccer pitch to be able to detect just those entities on the field without any noise using Detectron2, been trying to mask with the green hsv color range but i have an issues using the masked frames in Detectron2 for any analysis and thoiughts on how to go about this please?

facebookresearch / maskrcnn-benchmark

Support instance mask annotation with mask.png #256

🚀 Feature

Motivation