facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.29k stars 2.5k forks source link

change dataset, #77

Open LU4E opened 5 years ago

LU4E commented 5 years ago

❓ Questions and Help

I change the dataset, which i want to focus on segmentation. I offer the mask and label, but the code : maskrcnn_benchmark.modeling.matcher.Matcher have the problem. The matcher need the N*M tensor. but the boxlist_iou(target, proposal) output shape is: boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 126]) boxlist ops.py: iou shape is torch.Size([1, 128]) boxlist ops.py: iou shape is torch.Size([1, 132]) boxlist ops.py: iou shape is torch.Size([1, 127]) boxlist ops.py: iou shape is torch.Size([1, 126]) boxlist ops.py: iou shape is torch.Size([1, 125]) boxlist ops.py: iou shape is torch.Size([1, 130]) boxlist ops.py: iou shape is torch.Size([1, 123]) boxlist ops.py: iou shape is torch.Size([1, 128]) boxlist ops.py: iou shape is torch.Size([1, 133]) boxlist ops.py: iou shape is torch.Size([1, 119]) boxlist ops.py: iou shape is torch.Size([1, 126]) boxlist ops.py: iou shape is torch.Size([1, 121]) boxlist ops.py: iou shape is torch.Size([1, 135]) boxlist ops.py: iou shape is torch.Size([1, 108]) boxlist ops.py: iou shape is torch.Size([1, 129]) boxlist ops.py: iou shape is torch.Size([1, 1])

the result is

47758591-9e3a2680-dce6-11e8-88b8-4cb2e03e423e 47758594-a2feda80-dce6-11e8-80db-dd848a0e60af 47758596-a4c89e00-dce6-11e8-9755-e82564d82174 47758598-a72af800-dce6-11e8-9be4-36ff33b0d1ca

I have some questions:

  1. why this boxlist_iou run 33 times, my config size_divisibility is 16.
  2. why [1,1] will let the segmentation_masks become [0]
  3. How can i run the code step by step, which could help me unstand all of the code?
fmassa commented 5 years ago

Hi,

If you are training on 8 GPUs, it's normal that it gets printed many times, because each GPU will print it once. In this case, see that the boxlist ops.py: iou shape is torch.Size([1, 16368]) is printed 16 times, which seems normal for 2 images per GPU, 8 GPUs.

To run the code step-by-step, do not use multiple GPUs, run using a single GPU without -m torch.distributed.launch, and debug from there.

About the rest of the question, I unfortunately couldn't understand the issue very well, maybe you could try explaining a bit more?

LU4E commented 5 years ago

I am training on single GPU, and i try the gdb. But it dosen't help, i am still studying how to use it. The problem is: the mask will be [0] after def match_targets_to_proposals(self, proposal, target): match_quality_matrix = boxlist_iou(target, proposal) matched_idxs = self.proposal_matcher(match_quality_matrix)

Mask RCNN needs "labels" and "masks "fields for creating the targets

    target = target.copy_with_fields(["labels", "masks"])
    # get the targets corresponding GT for each proposal
    # NB: need to clamp the indices because we can have a single
    # GT in the image, and matched_idxs can be -2, which goes
    # out of bounds
    matched_targets = target[matched_idxs.clamp(min=0)]
    matched_targets.add_field("matched_idxs", matched_idxs)
    return matched_targets

this block. I don't figure out the shape of matched_idxs should be.

fmassa commented 5 years ago

If I understand it correctly, the reason might be that you have one (or more) images in your dataset that do not have boxes.

Can you try https://github.com/facebookresearch/maskrcnn-benchmark/pull/37 and follow the discussion in https://github.com/facebookresearch/maskrcnn-benchmark/issues/31 to see if that's indeed the case?

LU4E commented 5 years ago

i build the box manually with the bounder of the mask, and every sample should have one box at least. But in some sample ,the box could be every every small. i will check the box infomation and the code again tomorrow, because the lab is closing. thank u very very very much for ur reply so many times.

LU4E commented 5 years ago

i have check all of the issue, and i also find the issue31. But i don't think my problem is the same one with that. Because i kick out the sample without the mask before i build the dataset, and the box is build according to the mask. I am sure every sample should have one box.

fmassa commented 5 years ago

Could you try applying the patch from #37 in your code so that we have an idea if the problem is there or not?

I'm having troubles understanding what else the problem is actually.

LU4E commented 5 years ago

i change the code like this:

2018-11-01 07-56-25

unfortunately it doesn't work. there is no error be caught.

and i also print the boxes infomation in the function getitem, it give the result like this:

2018-11-01 07:54:12,314 maskrcnn_benchmark.trainer INFO: Start training dataset.py boxes is [[149 100 180 130]] dataset.py boxes is [[ 75 95 122 189]] dataset.py boxes is [[100 167 124 186]] dataset.py boxes is [[ 77 109 96 145]] dataset.py boxes is [[ 64 98 116 175]] dataset.py boxes is [[162 114 190 151]] dataset.py boxes is [[ 68 120 119 198]] dataset.py boxes is [[ 69 94 120 167]] dataset.py boxes is [[104 91 124 135]] dataset.py boxes is [[ 78 154 111 188]] dataset.py boxes is [[ 82 84 130 146]] dataset.py boxes is [[ 78 102 125 176]] dataset.py boxes is [[153 136 175 199]] dataset.py boxes is [[ 90 70 129 148]] dataset.py boxes is [[147 138 191 202]] dataset.py boxes is [[134 89 189 176]] dataset.py boxes is [[161 170 168 182]] dataset.py boxes is [[ 60 113 118 220]] dataset.py boxes is [[ 81 58 144 126]] dataset.py boxes is [[ 59 96 127 203]] dataset.py boxes is [[126 85 179 123]] dataset.py boxes is [[142 97 171 155]] dataset.py boxes is [[ 82 101 106 120]] dataset.py boxes is [[138 131 142 137]] dataset.py boxes is [[126 157 176 214]] dataset.py boxes is [[133 139 183 182]] dataset.py boxes is [[163 111 193 142]] dataset.py boxes is [[ 66 123 127 192]] dataset.py boxes is [[125 144 172 200]] dataset.py boxes is [[118 102 160 149]] dataset.py boxes is [[113 61 182 135]] dataset.py boxes is [[130 83 169 153]] dataset.py boxes is [[ 88 157 108 189]] dataset.py boxes is [[159 105 189 136]] dataset.py boxes is [[116 82 172 172]] dataset.py boxes is [[ 79 96 108 133]] dataset.py boxes is [[160 141 195 196]] dataset.py boxes is [[142 99 155 131]] dataset.py boxes is [[ 88 151 125 209]] dataset.py boxes is [[132 166 184 200]] dataset.py boxes is [[ 86 112 118 197]] dataset.py boxes is [[ 87 153 103 193]] dataset.py boxes is [[ 84 145 115 177]] dataset.py boxes is [[131 146 178 214]] dataset.py boxes is [[ 80 85 125 137]] dataset.py boxes is [[ 77 75 130 157]] dataset.py boxes is [[122 143 171 208]] dataset.py boxes is [[102 132 122 160]] dataset.py boxes is [[151 134 191 169]] dataset.py boxes is [[143 124 179 171]] dataset.py boxes is [[ 64 65 136 178]] dataset.py boxes is [[125 151 170 210]] dataset.py boxes is [[122 71 166 134]] dataset.py boxes is [[132 84 181 145]] dataset.py boxes is [[ 84 168 148 211]] dataset.py boxes is [[ 87 106 93 122]] dataset.py boxes is [[ 68 119 114 217]] dataset.py boxes is [[103 134 122 159]] dataset.py boxes is [[147 114 190 190]] dataset.py boxes is [[130 126 146 152]] dataset.py boxes is [[ 71 63 136 156]] dataset.py boxes is [[ 61 98 95 146]] dataset.py boxes is [[ 67 133 114 205]] dataset.py boxes is [[143 93 195 149]] dataset.py boxes is [[ 63 69 138 176]] dataset.py boxes is [[ 60 125 111 201]] dataset.py boxes is [[160 103 177 132]] dataset.py boxes is [[112 101 143 127]] dataset.py boxes is [[131 121 178 188]] dataset.py boxes is [[140 105 190 173]] dataset.py boxes is [[ 81 107 119 207]] dataset.py boxes is [[ 89 129 112 209]] dataset.py boxes is [[ 60 122 113 198]] dataset.py boxes is [[ 76 132 133 194]] dataset.py boxes is [[117 85 192 186]] boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) dataset.py boxes is [[152 102 173 127]] boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) dataset.py boxes is [[145 98 186 140]] boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) boxlist ops.py: iou shape is torch.Size([1, 16368]) dataset.py boxes is [[148 154 175 178]] boxlist ops.py: iou shape is torch.Size([1, 139]) boxlist ops.py: iou shape is torch.Size([1, 137]) boxlist ops.py: iou shape is torch.Size([1, 147]) boxlist ops.py: iou shape is torch.Size([1, 96]) boxlist ops.py: iou shape is torch.Size([1, 130]) boxlist ops.py: iou shape is torch.Size([1, 130]) boxlist ops.py: iou shape is torch.Size([1, 123]) boxlist ops.py: iou shape is torch.Size([1, 115]) boxlist ops.py: iou shape is torch.Size([1, 124]) boxlist ops.py: iou shape is torch.Size([1, 151]) boxlist ops.py: iou shape is torch.Size([1, 115]) boxlist ops.py: iou shape is torch.Size([1, 126]) boxlist ops.py: iou shape is torch.Size([1, 114]) boxlist ops.py: iou shape is torch.Size([1, 109]) boxlist ops.py: iou shape is torch.Size([1, 133]) boxlist ops.py: iou shape is torch.Size([1, 127]) dataset.py boxes is [[141 101 191 171]] dataset.py boxes is [[ 66 111 119 207]] boxlist ops.py: iou shape is torch.Size([1, 1]) dataset.py boxes is [[127 162 148 185]] dataset.py boxes is [[130 122 168 187]] dataset.py boxes is [[140 87 165 112]] dataset.py boxes is [[127 78 179 173]] dataset.py boxes is [[ 75 102 139 174]] dataset.py boxes is [[140 91 178 138]] dataset.py boxes is [[155 139 191 193]] dataset.py boxes is [[ 86 135 115 165]] dataset.py boxes is [[134 97 190 197]] dataset.py boxes is [[ 67 94 116 169]] dataset.py boxes is [[142 204 145 208]] dataset.py boxes is [[112 61 173 134]] dataset.py boxes is [[ 87 149 125 209]] dataset.py boxes is [[136 96 155 157]] dataset.py boxes is [[127 164 154 209]] dataset.py boxes is [[123 111 134 125]] dataset.py boxes is [[ 63 115 118 213]] dataset.py boxes is [[128 116 158 159]] dataset.py boxes is [[139 197 145 202]] dataset.py boxes is [[ 85 78 131 152]] dataset.py boxes is [[114 71 173 130]] dataset.py boxes is [[138 156 170 211]] dataset.py boxes is [[135 161 165 220]] dataset.py boxes is [[ 60 95 137 204]] dataset.py boxes is [[ 80 141 128 199]] dataset.py boxes is [[ 78 106 92 123]] dataset.py boxes is [[ 84 150 126 199]] dataset.py boxes is [[150 100 178 130]] dataset.py boxes is [[117 78 163 128]] dataset.py boxes is [[146 151 148 154]] dataset.py boxes is [[142 90 190 170]] dataset.py boxes is [[141 148 175 188]]

the bug is still this one:

Traceback (most recent call last): AssertionError: [0], BoxList(num_boxes=1, image_width=256, image_height=256, mode=xyxy)

LU4E commented 5 years ago

I find the source of problem: the shape of mask. And this issue is similar to the issue #16 . i don't know what shape the mask of coco offer, my mask shape is [256, 256, 1]

2018-11-01 10-01-04

that leads to the matched_target get a wrong mask:

2018-11-01 10-01-15

I change the format of mask to [1, 256, 256], the target recive the mask as [1, 256, 256], the matched_targets get the mask [256, 256],

2018-11-01 10-27-52

that will have another problem: after this code positive_proposals = proposals_per_image[positive_inds] :

2018-11-01 10-29-46

so i should change the code or i change the mask format. could you explain the mask shape should be in this model?

fmassa commented 5 years ago

Oh, now I see the issue I think

SegmentationMask currently assumes the masks are represented as a set of polygons, and not a binary segmentation mask.

This can be modified by adding support for binary segmentation masks as well. This is almost fully implemented, but as I didn't have any dataset to test it, I didn't finish it up.

Have a look at segmentation_mask.py and the Mask class there to see what I mean. I suppose this is the problem maybe?

LU4E commented 5 years ago

what should i do now? the mask must turn to be represented as a set of polygons for now? I offer the function which build the mask by myself, instead of using the class SegmentationMask. So what can i do in this function to meet the request of the model?

fmassa commented 5 years ago

For bow, you can also try modifying SegmentationMask to support Mask object instead of Polygon. It should not be hard, but requires some coding. Let me know if you require more guidance and I can help you.

LU4E commented 5 years ago

Yes, I need the help. Now the biggest problem is that i don't know the mask format needed by model. So could you tell me what function the class Mask should offer? Dose it need to adjust the class SegmentationMask ?

fmassa commented 5 years ago

@LU4E have a look at Mask. This starts the implementation of what you need, which is a class that holds segmentation masks as 2d tensors. You'll need to implement the resize method there (which can be done with torch.nn.functional.interpolate), and then modify the SegmentationMask class to use a list of Mask instead of a list of Polygons.

One last thing, you might also need to implement a convert method in Mask, which simply returns the underlying mask data as a Tensor.

Basically, if you make the API of Mask be the same as the API in Polygons, and replace

self.polygons = [Polygons(p, size, mode) for p in polygons]

by

self.polygons = [Mask(p, size, mode) for p in polygons]

in here, it should be enough for the code to run.

LU4E commented 5 years ago

OK,let me have a try.

LU4E commented 5 years ago

2018-11-02 18-51-28 2018-11-02 18-52-03 2018-11-02 18-50-21 2018-11-02 18-54-00 there is a bug in crop: 2018-11-02 19-00-39

2018-11-02 19-00-12

fmassa commented 5 years ago

There is a problem with your implementation I think: the Mask class is supposed to hold a single 2d segmentation mask (with possibly many channels), so self.masks should be a 3d Tensor, and the resize function shouldn't be passing a list to Mask.

LU4E commented 5 years ago

The class SegmentationMask have assert the input as a list. So I turn mask to be a list to keep the code working for both of mask and polygon. I am not familiar with the polygon. If it's not necessary to ensure the masks as a list, can I just delete the code of assert isinstance(ploy on,list)?

fmassa commented 5 years ago

So, you should think of SegmentationMask as a list of Mask, and a Mask as a single Tensor, But I believe that in your case you made scaled_masks be a list.

LU4E commented 5 years ago

I understand your point. I thought the segmentationmask receive a list of polygon,and dived it to each instance automatically, So as to mask, that is why I didn't change the segmentationmask. I will adjust the code of Mask. And Google the polygon to understand all of this class. Because I am working on another job, which costs too much time, resulting in having no time to check the flow of the data. I am appreciate your help very very much. And thank you again.

fmassa commented 5 years ago

No worries!

That being said, it would be great to have full support for Mask in the codebase, so PRs adding support for that would be more than welcome!

LU4E commented 5 years ago

Yes. At least for me, I'm desperate for this feature. Because It's clear to me that what I have done to the array, but I have no idea how to processing the polygon.

LU4E commented 5 years ago

I change the class like this:

`
class Mask(object):

 def __init__(self, masks, size, mode):
    self.masks = masks
    self.size = size
    self.mode = mode

def transpose(self, method):
    if method not in (FLIP_LEFT_RIGHT, FLIP_TOP_BOTTOM):
        raise NotImplementedError("Only FLIP_LEFT_RIGHT and FLIP_TOP_BOTTOM implemented")

    width, height = self.size
    if method == FLIP_LEFT_RIGHT:
        dim = width
        idx = 2
    elif method == FLIP_TOP_BOTTOM:
        dim = height
        idx = 1

    flip_idx = list(range(dim)[::-1])
    flipped_masks = self.masks.index_select(dim, flip_idx)
    return Mask(flipped_masks, self.size, self.mode)

def crop(self, box):
    box = [int(b) for b in box]
    w, h = box[2] - box[0], box[3] - box[1]
    cropped_masks = []
    for mask in self.masks:
        m = mask[box[1]: box[3], box[0]: box[2]]
        cropped_masks.append(m)
    return Mask(cropped_masks, size=(w, h), mode=self.mode)

# torch.nn.functional.interpolate has a arg as dim, only tensor have dim, so turn array to tensor
def resize(self, size, *args, **kwargs):
    scaled_masks = interpolate(torch.from_numpy(np.array(self.masks)[np.newaxis, :, :, :]).float(), size, mode='nearest')
    return Mask(scaled_masks, size=size, mode=self.mode)

def convert(self, mode):
    masks = self.masks[0,0,:]
    return masks

def __iter__(self):
    return iter(self.masks)

def __repr__(self):
    s = self.__class__.__name__ + "("
    s += "num_masks={}, ".format(len(self.masks))
    s += "image_width={}, ".format(self.size[0])
    s += "image_height={}, ".format(self.size[1])
    s += "mode={})".format(self.mode)
    return s`

and it works well. thank again for all your helping me these days.

wangg12 commented 5 years ago

@fmassa +1 for the support of this feature and the need for the feature of converting RLE/polygons format to binary mask.

fmassa commented 5 years ago

@wangg12 I agree. The support for Mask is almost all already done by @LU4E , support for converting to binary mask can be added at a later step.

Spandan-Madan commented 5 years ago

I think my problem is related. All I want is to train the model with my own Data Loader.

Currently, my data loader gives a tensor of NxHxWxC images (N=batch size,C=channels=3), and targets of size NxHxW which is a tensor of N binary masks.

when I try training, at the line

model.train();
for data in data_loader:
    images,targets = data
    images = images.to(device)
    targets = [target.to(device) for target in targets]
    loss_dict = model(images, targets)
    break

I get the error:-

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-3e622fd4a3f8> in <module>
      8 #     label_var = Variable(labels.long().cuda())
      9 
---> 10     loss_dict = model(images, targets)
     11     break

~/miniconda3/envs/torch_tens/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    477             result = self._slow_forward(*input, **kwargs)
    478         else:
--> 479             result = self.forward(*input, **kwargs)
    480         for hook in self._forward_hooks.values():
    481             hook_result = hook(self, input, result)

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py in forward(self, images, targets)
     48         images = to_image_list(images)
     49         features = self.backbone(images.tensors)
---> 50         proposals, proposal_losses = self.rpn(images, features, targets)
     51         if self.roi_heads:
     52             x, result, detector_losses = self.roi_heads(features, proposals, targets)

~/miniconda3/envs/torch_tens/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    477             result = self._slow_forward(*input, **kwargs)
    478         else:
--> 479             result = self.forward(*input, **kwargs)
    480         for hook in self._forward_hooks.values():
    481             hook_result = hook(self, input, result)

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in forward(self, images, features, targets)
     92 
     93         if self.training:
---> 94             return self._forward_train(anchors, objectness, rpn_box_regression, targets)
     95         else:
     96             return self._forward_test(anchors, objectness, rpn_box_regression)

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in _forward_train(self, anchors, objectness, rpn_box_regression, targets)
    108             with torch.no_grad():
    109                 boxes = self.box_selector_train(
--> 110                     anchors, objectness, rpn_box_regression, targets
    111                 )
    112         loss_objectness, loss_rpn_box_reg = self.loss_evaluator(

~/miniconda3/envs/torch_tens/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    477             result = self._slow_forward(*input, **kwargs)
    478         else:
--> 479             result = self.forward(*input, **kwargs)
    480         for hook in self._forward_hooks.values():
    481             hook_result = hook(self, input, result)

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward(self, anchors, objectness, box_regression, targets)
    146         # append ground-truth bboxes to proposals
    147         if self.training and targets is not None:
--> 148             boxlists = self.add_gt_proposals(boxlists, targets)
    149 
    150         return boxlists

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in add_gt_proposals(self, proposals, targets)
     58         device = proposals[0].bbox.device
     59 
---> 60         gt_boxes = [target.copy_with_fields([]) for target in targets]
     61 
     62         # later cat of bbox requires all fields to be present for all bbox

/data/graphics/toyota-pytorch/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in <listcomp>(.0)
     58         device = proposals[0].bbox.device
     59 
---> 60         gt_boxes = [target.copy_with_fields([]) for target in targets]
     61 
     62         # later cat of bbox requires all fields to be present for all bbox

AttributeError: 'Tensor' object has no attribute 'copy_with_fields'

Any suggested quick fixes?

Thanks!

LU4E commented 5 years ago

@Spandan-Madan I don't know what the data looks like, could you offer the dataloader which you build? there are two suggestion: first: the image shape should be NxCxHxW, pytorch is channel first. second: have you add the field to the target in the dataloader, which can be done like this: targets.add_field('mask',mask)

Spandan-Madan commented 5 years ago

@LU4E

My bad, it is N x C x H x W, I wrote the wrong order above by mistake.

The data loader right now returns a tuple of 2 elements. First element is a tensor of batch of images, the second is a tensor of batch of corresponding labels. Each image is 224x224, and batch size is 8, so the first tensor is 8x3x224x224. The labels are all 224x224 binary labels as I have only 2 categories : pixel = 1 if the region in the image corresponds to a car, 0 otherwise.

I probably should change the binary masks to BoxLists? I'm actually quite confused about how to pass things in to the model.

To me, it is most intuitive if I can do model(inputs) and get the outputs. Then loss etc can be calculated on this output. It's gotten really confusing because they've forced it all into a single function.

Any suggestions how I should change the dataloader?

Thanks a lot!

LU4E commented 5 years ago

the targets have fields like:mask, label. Build a boxlist, and add the field to it. If you don't want to change the code of the mask-rcnn, make sure your dataset have the same format as the sample it offers.

# create a BoxList from the boxes
boxlist = BoxList(boxes, image.size, mode="xyxy")
# add the labels to the boxlist
boxlist.add_field("labels", labels)

you can write your code like this.

Spandan-Madan commented 5 years ago

@LU4E I have a question though. If I make a box list, it will pass my segmentation masks as boxes. This is great for detecting a bounding box around them, but how does it get the segmentation masks?

fmassa commented 5 years ago

@Spandan-Madan sorry if parts of the code are confusing. Indeed, detection models are very complicated, and its behavior during training depends on the ground-truth, which makes things even more complicated. The reason is that the RPN uses the ground-truths during training to select which boxes to feed to the detection heads, so only following a output = model(inputs); loss = criterion(output, target) is not going to be as efficient / easy. But during inference you should be able to simply do model(inputs).

About segmentation masks, do you want to do instance segmentation or semantic segmentation? If you want to do instance segmentation, for each mask you can compute its bounding box during __getitem__ of your model, and attach it to the BoxList.

If you want to do semantic segmentation (where instances are not necessary), then this codebase is not yet adapted to your needs.

Let us know if you have further questions.

LU4E commented 5 years ago

@Spandan-Madan no, i think you make a mistake. this box is just for the detection, the label for your segmentation is the mask.
the box you can built it using your mask data.

Spandan-Madan commented 5 years ago

@fmassa That makes sense. We don't have instance level labels yet, so we can't get a bounding box around each object.