Mask returned in serverless automatic annotation does not agree with external result

hermda02 commented 5 months ago

Actions before raising this issue

[X] I searched the existing issues and did not find anything similar.
[X] I read/searched the docs

Steps to Reproduce

Create a simple serverless detector function which produces a binary mask.
Return an inference result with the following format:
results.append({ "type": "mask", "confidence": str(prob), "label": "Net", "mask": cvat_mask, }) where cvat_mask = binary_mask + [bounding box corners]
Select "Automatic annotation" on an image within a CVAT task.

Expected Behavior

Expect CVAT to return an annotated image with the input binary mask applied, like the attached image. (These are the same inference results that go into CVAT). image_prediction

Possible Solution

Documentation concerning the format required by CVAT in order for the binary mask to be properly handled.

Context

I've created a serverless function with my own pre-trained UNet model which returns masks for a given image by using PyTorch. When running inference on both my own machine and using Nuclio I am able to retrieve the same results as far as model outputs and mask details go.

However, when sending the mask in to CVAT, it returns a messy image: Screenshot from 2024-05-24 12-22-05

This appears to be an issue with going from a binary mask to the RLE solution (https://github.com/cvat-ai/cvat/blob/develop/cvat/apps/lambda_manager/views.py, lines 738-743).

Environment

Server version: 2.9.0
Core version: 12.0.0
Canvas version: 2.18.0
UI version: 1.58.0

bsekachev commented 5 months ago

Hello,

If you can reproduce it on one of functions already located in our repository, we will consider it as a bug.

hermda02 commented 5 months ago

Unfortunately I was not. If possible, I would like to change the label to a documentation issue.

hermda02 commented 5 months ago

This seems to be related to #6332 . Any chance you know in which piece of code the mask is parsed and applied to the input image?

I'm finding that an input block mask (mask[:500,:500] = 1) returns a masked image that appears to suffer some type of anti-aliasing:

bsekachev commented 5 months ago

Be sure you are sending correct borders.

[0, 0, 500, 500] - will be incorrect in this case [0, 0, 499, 499] is correct

hermda02 commented 5 months ago

The borders are correct and are determined by torchvision.ops.mask_to_boxes.

Docker log output shows:

Borders: [0, 0, 499, 499]
Mask: tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]])
Sum(mask): tensor(250000)

Edit: Corresponding code:

        mask_out[:] = 0
        mask_out[:500,:500] = 1

        boxes = masks_to_boxes(mask_out.unsqueeze(0)).tolist()
        box = [int(b) for b in boxes[0]]

        print(box)

        print(mask_out)
        print(torch.sum(mask_out))

        cvat_mask = mask_out.long().tolist() + box

        prob = torch.nn.functional.softmax(output, dim=1).cpu().numpy()
        results.append({
            "type": "mask",
            "confidence": str(prob),
            "label": "Net",
            "mask": cvat_mask,   
        })
        return results

hermda02 commented 3 months ago

Solution was found by utilizing the to_cvat_mask function found here: https://github.com/cvat-ai/cvat/blob/develop/serverless/openvino/base/shared.py and implemented here: https://github.com/cvat-ai/cvat/blob/develop/serverless/openvino/omz/intel/semantic-segmentation-adas-0001/nuclio/model_handler.py

cvat-ai / cvat