Augmentations on VOC dataset error

🐛 Bug

Describe the bug Augmentations do not work when loading the VOC dataset using the VOCMaskParser class. When a record has more than one instance of the same class, the VOCMaskParser does not create a separate mask for each instance. In the example code below, the code fails when it tries to display a record where there are multiple cat instances in the image. I think the problem is in icevision/core/mask.py:

class VocMaskFile(MaskFile):
   ...
    def to_mask(self, h, w) -> MaskArray:
        mask_arr = np.array(Image.open(self.filepath))
        obj_ids = np.unique(mask_arr)[1:]
        masks = mask_arr == obj_ids[:, None, None]

        if self.drop_void:
            masks = masks[:-1, ...]

        return MaskArray(masks)

To Reproduce

Steps to reproduce the behavior:

# Imports
from icevision.all import *
import icedata

# Load the Pascal VOC dataset
path = icedata.voc.load_data()

# Get the class_map, a utility that maps from number IDs to classs names
class_map = icedata.voc.class_map()

parser = parsers.VOCMaskParser(
    annotations_dir='path/to/voc/Annotations', 
    images_dir='path/to/voc/JPEGImages', 
    masks_dir='path/to/voc/SegmentationClass',
)

# Parse data
train_records, _ = parser.parse(data_splitter=None)

# Transformations
train_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(size=1024), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)

# Get indices of records where there are more than one instance of the same class
bad_records_idx = [idx for idx, r in enumerate(train_ds.records) if len(r.detection.labels) != len(set(r.detection.labels))]

idx = bad_records_idx[0]
print(train_ds.records[idx])
plt.imshow(Image.open(train_ds.records[idx].common.filepath))

show_samples([train_ds[idx]])

Output:

BaseRecord

common: 
    - Record ID: 2011_000758
    - Image size ImgSize(width=500, height=332)
    - Filepath: /home/ubuntu/.icevision/data/voc/JPEGImages/2011_000758.jpg
    - Img: None
detection: 
    - Class Map: <ClassMap: {'background': 0, 'aeroplane': 1, 'person': 2, 'tvmonitor': 3, 'train': 4, 'boat': 5, 'dog': 6, 'chair': 7, 'bird': 8, 'bicycle': 9, 'bottle': 10, 'sheep': 11, 'diningtable': 12, 'horse': 13, 'motorbike': 14, 'sofa': 15, 'cow': 16, 'car': 17, 'cat': 18, 'bus': 19, 'pottedplant': 20}>
    - Labels: [18, 18, 18]
    - BBoxes: [<BBox (xmin:1, ymin:54, xmax:136, ymax:188)>, <BBox (xmin:34, ymin:69, xmax:220, ymax:330)>, <BBox (xmin:273, ymin:68, xmax:497, ymax:206)>]
    - masks: [<icevision.core.mask.VocMaskFile object at 0x7fc1cc6b83d0>]
    - mask_array: None


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-225-561c1b9025a4> in <module>
      1 print(train_ds.records[bad_records_idx[0]])
----> 2 show_samples([train_ds[bad_records_idx[0]]])

~/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/icevision/data/dataset.py in __getitem__(self, i)
     35         record = self.records[i].load()
     36         if self.tfm is not None:
---> 37             record = self.tfm(record)
     38         else:
     39             # HACK FIXME

~/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/icevision/tfms/transform.py in __call__(self, record)
      9         # TODO: this assumes record is already loaded and copied
     10         # which is generally true
---> 11         return self.apply(record)
     12 
     13     @abstractmethod

~/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/icevision/tfms/albumentations/albumentations_adapter.py in apply(self, record)
    285         # collect results
    286         for collect_op in sorted(self._collect_ops, key=lambda x: x.order):
--> 287             collect_op.fn(record)
    288 
    289         return record

~/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/icevision/tfms/albumentations/albumentations_adapter.py in collect(self, record)
    134 
    135     def collect(self, record):
--> 136         masks = self.adapter._filter_attribute(self.adapter._albu_out["masks"])
    137         masks = MaskArray(np.array(masks))
    138         self._record_component.set_mask_array(masks)

~/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/icevision/tfms/albumentations/albumentations_adapter.py in _filter_attribute(self, v)
    304         if self._keep_mask is None or len(self._keep_mask) == 0:
    305             return v
--> 306         assert len(v) == len(self._keep_mask)
    307         return [o for o, keep in zip(v, self._keep_mask) if keep]
    308 

AssertionError:

Expected behavior It should show an image of the record with an overlay of the mask and bbox with the specified augmentations.

Environment:

OS: Ubuntu 18.04
AWS EC2 AMI: Deep Learning AMI (Ubuntu 18.04) Version 54.0

airctic / icevision