cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.48k stars 2.99k forks source link

Documentation on mask format for cvat_sdk annotations #6777

Closed pHaeusler closed 10 months ago

pHaeusler commented 1 year ago

Is there documentation for the mask format used in the cvat_sdk?

Reverse engineered it looking at: https://github.com/opencv/cvat/blob/develop/cvat-core/src/annotations-collection.ts#L849

Essentially you need to provide a points array for bounding box RLE values followed by [left, top, right, bottom] corners of the bounding box

def mask_to_rle(mask):
    def reducer(acc, item):
        idx, val = item
        if idx > 0:
            if mask[idx - 1] == val:
                acc[-1] += 1
            else:
                acc.append(1)
            return acc

        if val > 0:
            acc.extend([0, 1])
        else:
            acc.append(1)
        return acc
    return reduce(reducer, enumerate(mask), [])

def to_cvat_mask(box: list, mask):
    xtl, ytl, xbr, ybr = box
    flattened = mask[ytl : ybr + 1, xtl : xbr + 1].flat[:].tolist()
    rle = mask_to_rle(flattened)
    rle.extend([xtl, ytl, xbr, ybr])
    return rle

shapes = []
for mask, label in zip(raw_label["masks"], raw_label["labels"]):
    binary_mask = mask.numpy()
    contours = find_contours(binary_mask, 0.5)
    contour = contours[0]
    contour = np.flip(contour, axis=1)
    contour = approximate_polygon(contour, tolerance=2.5)
    if len(contour) < 3:
        continue
    Xmin = int(np.min(contour[:, 0]))
    Xmax = int(np.max(contour[:, 0]))
    Ymin = int(np.min(contour[:, 1]))
    Ymax = int(np.max(contour[:, 1]))
    cvat_mask = to_cvat_mask((Xmin, Ymin, Xmax, Ymax), binary_mask)
    shapes.append(
        models.LabeledShapeRequest(
            frame=scan_point,
            label_id=0,
            type="mask",
            points=cvat_mask,
        )
    )

req = models.LabeledDataRequest(shapes=shapes)
task.set_annotations(data=req)
bsekachev commented 1 year ago

There is RLE mention: https://opencv.github.io/cvat/docs/manual/advanced/xml_format/

Basically bitmap:

[0 0 1] [1 1 1] [0 0 1]

encoded as: 2 4 2 1 (two zeros, four ones, two zeros, one one)

bsekachev commented 1 year ago

Is there documentation for the mask format used in the cvat_sdk?

Not sure I understand this question. cvat_sdk can take different formats and pass them to CVAT server where they are parsed in different ways.

zhiltsov-max commented 1 year ago

Please check https://github.com/opencv/cvat/issues/6487#issuecomment-1640097518.

pHaeusler commented 1 year ago

@bsekachev - the RLE in the XML format doesn't contain the bounding box dimensions suffix, rather the width and height are added as attributes to the element in the XML.

When using the cvat_sdk - you must concatenate the RLE and the bounding box dimensions to make a valid points array

The ask is to add documentation on this