Open txytju opened 5 years ago
There is an open PR that adds support for binary masks in https://github.com/facebookresearch/maskrcnn-benchmark/pull/150
You can try it out to see if it works for your use-case. I still haven't had the time to pull it down and try it out for myself though, that's why I haven't merged the PR yet.
OK, I will try it today and if it works I will report it here. The binary mask is a better form in my opinion rather than polygons and I think it should be the default form of instance mask.
I have tested the code and have written a corresponding dataloader for image-mask input data. This dataloader is modified from COCODataset. Would you like to help and check it? especially the corresponding relationship between image information(like size) and image. What's more, generating instance masks online(during training) is quite slow(about 10 times slower than polygon), and after I check the logic of dataloader is right, I will make instance mask generation offline. Thank you in advance. If the codes work, maybe #150 should be merged to master branch.
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import os
import numpy as np
import torch
import torchvision
from PIL import Image
from maskrcnn_benchmark.structures.bounding_box import BoxList
from maskrcnn_benchmark.structures.segmentation_mask import SegmentationMask
class COCODatasetBinaryMask(torchvision.datasets.coco.CocoDetection):
def __init__(
self, ann_file, root, transforms=None
):
super(COCODatasetBinaryMask, self).__init__(root, ann_file)
# sort indices for reproducible results
self.ids = sorted(self.ids)
self.json_category_id_to_contiguous_id = {
v: i + 1 for i, v in enumerate(self.coco.getCatIds())
}
self.contiguous_category_id_to_json_id = {
v: k for k, v in self.json_category_id_to_contiguous_id.items()
}
self.id_to_img_map = {k: v for k, v in enumerate(self.ids)}
self.transforms = transforms
self.root = root
self.image_root = self.root + "images/"
self.mask_root = self.root + "masks/"
image_names = [image_name.split(".")[0] for image_name in os.listdir(self.image_root) if ".jpg" in image_name]
mask_names = [mask_name.replace("_mask","").split(".")[0] for mask_name in os.listdir(self.mask_root) if ".png" in mask_name]
self.names = list(set(image_names) & set(mask_names))
def __getitem__(self, idx):
name = self.names[idx]
image_path = self.image_root + name + ".jpg"
mask_path = self.mask_root + name + "_mask.png"
img = Image.open(image_path)
mask = np.array(Image.open(mask_path))
boxes, masks = self._get_insts_bbox_mask_from_mask(mask, third_object_color="red")
# boxes : a list of list [[x,y,w,h],[x,y,w,h],[...],[...],]
boxes = torch.as_tensor(boxes).reshape(-1, 4) # guard against no boxes
target = BoxList(boxes, img.size, mode="xywh").convert("xyxy")
classes = [1] * len(boxes) # only one class in my dataset
classes = torch.tensor(classes)
target.add_field("labels", classes)
# masks : list of numpy array
masks = SegmentationMask(masks, img.size)
target.add_field("masks", masks)
target = target.clip_to_image(remove_empty=True)
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target, idx
def get_img_info(self, index):
img_id = self.id_to_img_map[index]
img_data = self.coco.imgs[img_id]
print(img_data)
return img_data
def _get_insts_bbox_mask_from_mask(self, mask, third_object_color="red"):
colors = np.unique(mask.reshape(-1, mask.shape[2]), axis=0)
colors = [list(color) for color in colors]
if third_object_color=="red":
abandon_colors = [[0, 0, 0], [0, 0, 255]]
elif third_object_color=="pink":
abandon_colors = [[0, 0, 0], [237, 199, 244]] # pink as the 3rd object
inst_colors = [color for color in colors if color not in abandon_colors]
boxes = []
masks = []
for i in range(len(inst_colors)):
inst_mask = np.all(np.equal(mask, inst_colors[i]), axis=2)
inst_mask = np.where(inst_mask==True, 1, 0)
inst_mask = inst_mask.astype(np.uint8)
# kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT,(7, 7))
# inst_mask = cv2.morphologyEx(inst_mask, cv2.MORPH_CLOSE, kernel_open)
# kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT,(7, 7))
# inst_mask = cv2.morphologyEx(inst_mask, cv2.MORPH_CLOSE, kernel_close)
box = self._bbox(inst_mask)
box_area = self._area(box)
if box_area >= 100 :
y_min,y_max,x_min,x_max = box
boxes.append([x_min, y_min, x_max-x_min, y_max-y_min])
masks.append(inst_mask)
return boxes, masks
def _bbox(self, img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox # y_min,y_max,x_min,x_max
def _area(self, box):
return (box[1]-box[0]) * (box[3]-box[2])
According to https://github.com/facebookresearch/maskrcnn-benchmark/pull/150/files#diff-928af5178eceaef7d662fe22c85f439aR209, because in your case masks
is a list of numpy.ndarray
, I'd imagine that you'd need to pass mode='mask'
to the code in SegmentationMask
for it to take the right path.
Here is one thing I'd do to verify that the code works as expected (without any transform in the dataset):
dataset[0]
)target
, do a transformation like target.transpose(0)
to flip horizontallyimg.transpose(0)
(img
is a PIL image) to flip the image horizontallytarget.get_field('mask')
with the image and verify that they are both flippedI will try to verify that, thanks!
What's more, because segms[0]
is a numpy array rather than list, so the mode
is set to "mask"
without setting it by yourself.
class SegmentationMask(object):
"""
This class stores the segmentations for all objects in the image
"""
def __init__(self, segms, size, mode=None):
"""
Arguments:
segms: three types
(1) polygons: a list of list of lists of numbers. The first
level of the list correspond to individual instances,
the second level to all the polygons that compose the
object, and the third level to the polygon coordinates.
(2) rles: COCO's run length encoding format, uncompressed or compressed
(3) binary masks
size: (width, height)
mode: 'polygon', 'mask'. if mode is 'mask', convert mask of any format to binary mask
"""
assert isinstance(segms, list)
if type(segms[0]) != list:
mode = 'mask'
Oh, I missed the segms[0]
, I just saw it as segms
. Sounds good then!
I have verified transpose operation, it works.
Cool!
Now if the rest of the training works out of the box with your dataset, then that is a good signal that we can be looking again into merging that PR
Yes. I will generate offline dataset(instance masks) and train on my dataset. After that, if both polygon annotation and binary mask annotation work, maybe we should consider merging that PR. I will report the training result within 24 hours.
You can increase the number of worker threads in the dataloader, so that you don't need to generate it offline - it will probably be simpler
OK. I have implemented both online and offline method and using online method currently. I tried to overfit a large model on only 2 images, but the result is not that good. The predicted mask seems to have shifted a few pixels to the right compared with the ground_truth mask, as you can see in these images. I have no idea what's wrong here. raw_image : https://ws2.sinaimg.cn/large/006tNbRwly1fy3u4tpu0kj30u0190qs3.jpg inst_1 : https://ws4.sinaimg.cn/large/006tNbRwly1fy3u4wtp88j30u0190neq.jpg inst_2 : https://ws3.sinaimg.cn/large/006tNbRwly1fy3u4vk8qnj30u0190qkh.jpg
This might indicate that there are still a few problems with the current implementation in the Mask
class.
One thing I'd do: try transposing the masks twice. They should give the original result. If that's not the case, then the transposing is introducing some +1 shifts somewhere that should be fixed.
Thanks, I will try that!
By the way, will this be caused by the inconsistent between Mask
and loss calculations in the main project or something? Or we can make sure that if Mask
class is perfectly implemented, it would work with the whole project perfectly?
If the Mask
class is perfectly implemented, then the rest of the codebase shouldn't be affected and it should work nicely.
I tried to transpose the BoxList twice and it turns out it gives the original result.
from maskrcnn_benchmark.data.datasets.coco_binary_mask import COCODatasetBinaryMaskOnLine
ann_file = "path/to/data_binary_mask.json"
root = "path/to/dataset"
coco_binary_mask = COCODatasetBinaryMaskOnLine(ann_file, root, transforms=None)
img, target, _ = coco_binary_mask[1] # index=1 for example
masks = target.get_field('masks').masks
f_f_target = target.transpose(0).transpose(0)
f_f_masks = f_f_target.get_field('masks').masks
# fliped twice masks
mask_1 = f_f_masks[0].mask.numpy()
mask_2 = f_f_masks[1].mask.numpy()
# original masks
mask_3 = masks[0].mask.numpy()
mask_4 = masks[1].mask.numpy()
print(np.all(np.equal(mask_1,mask_3)))
print(np.all(np.equal(mask_1,mask_4)))
print(np.all(np.equal(mask_2,mask_3)))
print(np.all(np.equal(mask_2,mask_4)))
# result is two `True` and two `False`, which means that the fliped masks are equal to original ones.
What's more, I tried to turn off data agumentation but the trained model still predict a shifted mask.
def build_transforms(cfg, is_train=True):
if is_train:
min_size = cfg.INPUT.MIN_SIZE_TRAIN
max_size = cfg.INPUT.MAX_SIZE_TRAIN
flip_prob = 0.5 # cfg.INPUT.FLIP_PROB_TRAIN
else:
min_size = cfg.INPUT.MIN_SIZE_TEST
max_size = cfg.INPUT.MAX_SIZE_TEST
flip_prob = 0
to_bgr255 = cfg.INPUT.TO_BGR255
normalize_transform = T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=to_bgr255
)
transform = T.Compose(
[
T.Resize(min_size, max_size),
T.ToTensor(),
normalize_transform,
]
)
return transform
@txytju One more thing to verify is that the crop
matches exactly the results in here.
Apart from that, I don't see any other cases where it should be a problem, but maybe the crop
is missing a +1
or -1
offset somewhere?
@fmassa The only difference that I can see is [int(b) for b in box]
in Mak.crop()
but not in Polygon.crop()
. Maybe the shift is caused by int()
? However, since in Mask
class, we are operating on image mask in which index must be of type int
, right? So I have no idea how to solve this bug and I have been stuck here for a few days....
So, one thing to also take into account is that the transforms rescale the image during training so that they have a particular size. This downsampling (.resize()
could potentially be introducing a shift).
Apart from that, I do not have any more ideas for now. Maybe @wangg12 knows a bit more, given that he has implemented (and potentially used) the Mask
functions.
I think the +1
offset might explain the issue. In mmdetection, they use w = max(x2 - x1 +1)
everywhere consistently, while it is not consistent in this implementation (maybe the historical reason from the implementation of Detectron?). @fmassa Have you tried with the +1
everywhere version and how does it perform?
The current implementation of Polygons here follow the implementation of Detectron, which is a legacy behavior which adds a 1
for computing the width of boxes. But that's something we kept for consistency with previous models.
@fmassa Shift problem in my dataset has been solved by just use
scaled_mask = interpolate(self.mask[None, None, :, :], (height, width), mode='bilinear')[0, 0]
rather than scaled_mask = interpolate(self.mask[None, None, :, :], (height, width), mode='nearest')[0, 0]
in Mask.resize()
method.
I don't know exactlly why hat happens...
I think nearest interpolation might bring those artifacts, but great to know that this was the solution for your case, it's very helpful!
I trained my model using a small dataset that contains 2 images and it succeeds. But when I try to train on a larger dataset, a out of memory
cames.
dataloader done!
2018-12-18 19:05:07,871 maskrcnn_benchmark.trainer INFO: Start training
2018-12-18 19:07:01,964 maskrcnn_benchmark.trainer INFO: eta: 2 days, 15:21:06 iter: 20 loss: 1.7994 (2.8083) loss_classifier: 0.1732 (0.4978) loss_box_reg: 0.0491 (0.0551) loss_mask: 1.2106 (2.0742) loss_objectness: 0.0674 (0.1511) loss_rpn_box_reg: 0.0117 (0.0301) time: 4.5573 (5.7045) data: 4.0758 (5.1670) lr: 0.001076 max mem: 2071
Traceback (most recent call last):
File "tools/train_net.py", line 206, in <module>
main()
File "tools/train_net.py", line 199, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 108, in train
arguments,
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 71, in do_train
loss_dict = model(images, targets)
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__
labels, regression_targets = self.prepare_targets(anchors, targets)
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets
anchors_per_image, targets_per_image
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 37, in match_targets_to_anchors
match_quality_matrix = boxlist_iou(target, anchor)
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 79, in boxlist_iou
lt = torch.max(box1[:, None, :2], box2[:, :2]) # [N,M,2]
RuntimeError: CUDA out of memory. Tried to allocate 1.63 GiB (GPU 0; 10.91 GiB total capacity; 1.65 GiB already allocated; 1.56 GiB free; 167.17 MiB cached)
I'm using one GPU and one image per batch.
By the way, I am using offline dataset.
class COCODatasetBinaryMaskOffLine(torchvision.datasets.coco.CocoDetection):
'''
Like COCO dataset which use binary mask as instance annotation rather than polygons.
This class use instance masks offline(during training), which is fast.
Check tools/datasets/data_generate_utils.py for annotation generation.
'''
def __init__(
self, ann_file, root, transforms=None
):
super(COCODatasetBinaryMaskOffLine, self).__init__(root, ann_file)
# sort indices for reproducible results
self.ids = sorted(self.ids)
self.json_category_id_to_contiguous_id = {
v: i + 1 for i, v in enumerate(self.coco.getCatIds())
}
self.contiguous_category_id_to_json_id = {
v: k for k, v in self.json_category_id_to_contiguous_id.items()
}
self.id_to_img_map = {k: v for k, v in enumerate(self.ids)}
self.transforms = transforms
self.root = root
self.image_root = self.root + "images/"
self.mask_root = self.root + "masks/"
self.inst_mask_root = self. root + "insts_masks/"
image_names = [image_name.split(".")[0] for image_name in os.listdir(self.image_root) if ".jpg" in image_name]
mask_names = [mask_name.replace("_mask","").split(".")[0] for mask_name in os.listdir(self.mask_root) if ".png" in mask_name]
self.names = list(set(image_names) & set(mask_names))
def __getitem__(self, idx):
name = self.names[idx]
image_path = self.image_root + name + ".jpg"
img = Image.open(image_path)
boxes, masks = self._get_insts_bbox_mask_offline(name)
# boxes : a list of list [[x,y,w,h],[x,y,w,h],[...],[...],]
boxes = torch.as_tensor(boxes).reshape(-1, 4) # guard against no boxes
target = BoxList(boxes, img.size, mode="xywh").convert("xyxy")
classes = [1] * len(boxes)
classes = torch.tensor(classes)
target.add_field("labels", classes)
masks = SegmentationMask(masks, img.size) # masks : list of numpy array
target.add_field("masks", masks)
target = target.clip_to_image(remove_empty=True)
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target, idx
def get_img_info(self, index):
img_id = self.id_to_img_map[index]
img_data = self.coco.imgs[img_id]
return img_data
def _get_insts_bbox_mask_offline(self, image_name):
boxes = []
masks = []
instance_mask_names = [name for name in os.listdir(self.inst_mask_root) if image_name in name]
for instance_mask_name in instance_mask_names:
instance_mask_path = self.inst_mask_root + instance_mask_name
instance_mask = cv2.imread(instance_mask_path)
instance_mask = instance_mask[:,:,0]
instance_mask = np.where(instance_mask==255, 1, 0)
masks.append(instance_mask)
y_min,y_max,x_min,x_max = self._bbox(instance_mask)
boxes.append([x_min, y_min, x_max-x_min, y_max-y_min])
return boxes, masks
def _bbox(self, img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox # y_min,y_max,x_min,x_max
The reason is that you probably have a lot of GT per image. I'd recommend moving the box_iou computation to happen on the CPU, as discussed in https://github.com/facebookresearch/maskrcnn-benchmark/issues/18
Thanks for that. One thing that I want to make sure is that OOM is not caused by loading all data before training in the Dataloader, right? Is it true that we just load batch images and targets when training?
I think there will not be more than 5 instances in per image and both gpu memory and cpu memory are high.
The OOM is happening on the GPU, so it's probably not related to data loading I believe. The data loader usually acts on CPU data.
Update : I set NUM_WORKERS = 0
and this problem is gone.
I encountered OOM when loading data, as follows.
Traceback (most recent call last):
File "tools/train_net.py", line 206, in <module>
main()
File "tools/train_net.py", line 199, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 108, in train
arguments,
File "/root/txy1/mask-rcnn/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
idx, batch = self._get_batch()
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
return self.data_queue.get()
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/opt/conda/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 274, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 10507) is killed by signal: Killed.
I have carried a few experiments. When I use dataset of only a few images(like 2 or 5 images), training works well. However when I try to train on a large dataset, there is always an GPU OOM problem. So I think that maybe that's not because of too many instances in an image, but that is because I saved some data about the whole dataset in GPU memory which gives me GPU OOM when using large dataset. Could you please give me some hint where to debug? thanks in advance!
Depending on how you stored the data in the dataset (for example a numpy array), each worker will copy the whole object to each new thread, making it require a lot of CPU memory. If those are torch tensors, you should be fine.
I'd recommend not loading the full dataset in memory, but instead load it at every getitem call
Thanks for your patience! It turns out that the instances in an image are too many, which is the result of bad image labeling. After removing those images and label, the training works well.
Awesome, thanks!
So, to summarize, the only thing that you had to change in order for your training to work as expected (on top of the PR adding better mask support) is to change the interpolation mode to bilinear instead of nearest, is that right?
Sorry to be late. Yes, after the change of interpolation, it works for me. However, because when operating resize and crop operations, you can only use int type when using binary mask, rather than float type when using polygon. And if the mask resolution is small(like 28), the ground truth mask based on binary mask is not as good as polygons.
@txytju sounds good, thanks for the information!
Sorry to be late. Yes, after the change of interpolation, it works for me. However, because when operating resize and crop operations, you can only use int type when using binary mask, rather than float type when using polygon. And if the mask resolution is small(like 28), the ground truth mask based on binary mask is not as good as polygons.
Could you merge your code? I think your work is awesome. What's more, is there any schedule to support RLE format masks? There are some COCO datasets use RLE format. I think it would be nice since then we don't have to extract instances from PNG images.
@JoyHuYY1412 I'm willing to merge the PR that adds support for it, I'd just ask for it to have unit tests so that we know we are computing the same things for polygons and masks
I could help with the unit tests, this PR would be useful for many I think
@botcs yes, please!
So, could you please make a list of tests that should be done? I am now currently running my DIY script for GTA5->COCO polygon... and trust me I have time :D
another thing: if this module is merged, than can we use the evaluation tools, tools/test_net.py
?
Here are a few tests I think would be useful to have:
If support for masks
is added, then it will be very simple to be able to use it either in tools/train_net.py
or tools/test_net.py
Let me know if you have further questions!
Okay, I still have a few questions, just for clarification:
test_segmentation_mask.py
with the mentioned tests implemented: resize / transpose / crop / convert why is it required to re-implement them?../home/csbotos/anaconda3/envs/debugmask/lib/python3.7/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
diff resize: tensor(210.)
F0 tensor(218.)
F
======================================================================
FAIL: test_resize (__main__.TestSegmentationMask)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_segmentation_mask.py", line 51, in test_resize
self.assertTrue(torch.equal(mask_from_poly_resize, mask_resize))
AssertionError: False is not true
======================================================================
FAIL: test_transpose (__main__.TestSegmentationMask)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_segmentation_mask.py", line 44, in test_transpose
self.assertTrue(torch.equal(mask_flip, mask_from_poly_flip))
AssertionError: False is not true
----------------------------------------------------------------------
Ran 4 tests in 0.021s
FAILED (failures=2)
[Edit]: I will continue this thread at #150 and will get back to this when unit tests are done
@fmassa @botcs I think it is hard to make the behavior of binary mask and polygons exactly the same apart from the old detectron inconsistency.
PR #150 alone may be not enough for binary masks to work as well as polygons, since this codebase was optimized for polygons based input.
A possible good practice would be trying PR #150 and making some necessary modifications while using binary masks to make the coco performance as well as using polygons (e.g., by adding a global config flag to alter betwwen these two modes). You may refer to mmdetection for the necessary changes since it inherently utilizes binary masks.
Thanks for your comments @wangg12 ! About the codebase being optimized for polygons, do you mean that the results were optimized using polygons? Because runtime-wise, it shouldn't be too different maybe? But it would use more memory, for sure.
@botcs I've commented in #150 , let me know what you think.
hello, i am trying to do a bckground color based segmentation of a soccer pitch to be able to detect just those entities on the field without any noise using Detectron2, been trying to mask with the green hsv color range but i have an issues using the masked frames in Detectron2 for any analysis and thoiughts on how to go about this please?
🚀 Feature
Instances mask image can be used as the ground_truth label. For example, in the PNG file, every instance is labeled using a unique color.
Motivation
Currently, annotations for instances is COCO-style, in which instance mask is annotated by polygons. However, if instance mask has holes, the polygon annotation fails. But if we use a binary instance mask PNG, it can handle holes in the instance masks.