fizyr / keras-maskrcnn

Keras implementation of MaskRCNN object detection.
Apache License 2.0
406 stars 131 forks source link

Off-by-one rounding error in OID mask resizing #106

Open woutgg opened 4 years ago

woutgg commented 4 years ago

I ran into this issue while attempting to train using my own OpenImages-based dataset, which surfaces in the form of an error like:

ValueError: could not broadcast input array from shape (963200) into shape (962400)

The problem is that OID masks have a long edge of 1600px where images have at most 1024px, in some cases this causes a rounding error during rescaling. For instance:

Later on this results in the above error, where this difference of 800 pixels can be seen. My current fix is to extend Generator.resize_image() with an optional size argument:

    def resize_image(self, image, size=None):
        if size:
            return (cv2.resize(image, size), None)
        else:
            return resize_image(image, min_side=self.image_min_side, max_side=self.image_max_side)

The resized image dimensions are then passed for each accompanying mask so that it's guaranteed to have the same shape.

I'm not sure however whether this is the best approach, e.g. perhaps you'd prefer to handle this in keras-retinanet's resize_image (which is being called here).

Let me know what you think and I'll submit a PR.

hgaiser commented 4 years ago

Their masks have a different size than their images? Why would they do that, it makes no sense =\

Would it be possible in load_annotations to resize the mask to be the same size as the image?

woutgg commented 4 years ago

I agree that it is rather inconvenient. :) Apparently resizing rules during annotation and during preparation of the final image set were different (according to https://github.com/openimages/dataset/issues/92#issuecomment-567618963).

Resizing the masks in load_annotations might be more logical, but unless I missed something, the image dimensions are not available there. So it would be possible, but you'd have to temporarily load each image just to obtain its size.

hgaiser commented 4 years ago

I don't know the OID format, in COCO they define the width and height in the annotations file I believe, that would've been ideal :)

If that's not possible, then I'd suggest changing resize_image in keras-retinanet and in the places where we call that function, accept *args, **kwargs and pass them to the function they wrap.