Missing mask annotation format example in documentation

AljoSt commented 1 year ago

My actions before raising this issue

[x] Read/searched the docs
[x] Searched past issues

First of all, thanks for maintaining such a great tool! I was trying out the new brush tool and got stuck figuring out the format of the mask in the annotations file (CVAT for images 1.1). The format documentations provide examples for polygons, polylines, etc but lacks examples for masks. As the format is not super intuitive (not 100% sure what "rle" is and what "left", "top", etc are for) it would be good if that could be added. Happy to do it, if someone provides an explanation.

Edit 1: Googling would have helped me at least understanding what RLE is: https://en.wikipedia.org/wiki/Run-length_encoding Might still make sense to put at least a link into the docs. Edit 2: All the RLEs that I get in my annotations have an uneven number of elements, which contradicts all the explanations that I found for RLE. Is this a bug or is this a different RLE version?

bsekachev commented 1 year ago

Hi @AljoSt

Thank you for question. Adding example to documentation makes sence.

left, top, width, height are supposed to be used because we do not want to store each small mask for a whole image. So, instead we store only image fragment [top:top+height) : [left:left+width)
RLE can be decoded this way (code in Typescript, but quite intuitive even if you do not know it):

export function rle2Mask(rle: number[], width: number, height: number): number[] {
    const decoded = Array(width * height).fill(0); // create bitmap container
    const { length } = rle;  
    let decodedIdx = 0;
    let value = 0;
    let i = 0;

    while (i < length) {
        let count = rle[i];   // get sequence length of ``0`` or ``1``
        while (count > 0) {   // write to result container ``0`` or ``1`` 
            decoded[decodedIdx] = value;
            decodedIdx++;
            count--;
        }
        i++;  
        value = Math.abs(value - 1); // inverse 0 <--> 1
    }

    return decoded;
}

Also, you can use other multiple formats for masks.

AljoSt commented 1 year ago

Thanks for the explanation. It looks like you are creating an unrolled image. Then you set the first rle[0] elements to 0, then you set the following rle[1] elements to 1, the following rle[2] elements to 0 etc (?). Which means that the sum or the elements in rle should be the same as width * height? This doesn't work for me. Here is an example:

<mask label="test_label" source="manual" occluded="0" rle="130, 28, 142, 30, 140, 32, 138, 34, 135, 37, 133, 39, 131, 41, 129, 43, 128, 44, 126, 46, 124, 47, 123, 16, 19, 14, 121, 15, 22, 14, 120, 14, 24, 13, 120, 13, 26, 12, 119, 13, 28, 11, 119, 12, 29, 11, 118, 12, 30, 11, 117, 13, 30, 12, 116, 13, 30, 12, 116, 12, 32, 11, 115, 12, 33, 11, 114, 13, 33, 11, 114, 12, 34, 12, 112, 12, 35, 12, 112, 12, 36, 11, 112, 12, 36, 11, 112, 11, 37, 11, 111, 12, 37, 11, 110, 12, 37, 12, 110, 11, 38, 12, 110, 11, 38, 12, 109, 12, 38, 12, 109, 12, 38, 11, 110, 11, 39, 11, 110, 11, 39, 11, 109, 12, 39, 11, 109, 11, 40, 11, 109, 11, 40, 11, 109, 11, 39, 12, 108, 12, 39, 12, 108, 11, 40, 12, 107, 12, 39, 13, 107, 12, 39, 12, 107, 13, 39, 11, 108, 12, 40, 11, 108, 12, 40, 11, 108, 12, 39, 12, 107, 13, 39, 12, 107, 12, 40, 11, 108, 11, 41, 11, 107, 12, 41, 11, 107, 11, 41, 12, 107, 11, 41, 12, 107, 11, 41, 12, 107, 11, 40, 12, 108, 11, 40, 11, 108, 12, 40, 11, 108, 11, 41, 11, 108, 11, 40, 12, 107, 12, 40, 12, 106, 13, 40, 12, 106, 12, 40, 13, 106, 11, 41, 12, 106, 12, 40, 12, 107, 12, 40, 12, 107, 12, 39, 13, 106, 13, 39, 13, 106, 12, 39, 14, 106, 11, 40, 13, 107, 11, 39, 13, 108, 11, 39, 12, 108, 12, 39, 11, 108, 13, 39, 11, 108, 12, 39, 12, 108, 11, 40, 12, 108, 11, 40, 11, 109, 11, 39, 12, 108, 12, 39, 12, 107, 13, 39, 12, 107, 12, 40, 12, 107, 11, 40, 13, 106, 12, 40, 12, 106, 13, 40, 11, 107, 12, 40, 12, 106, 12, 41, 11, 107, 12, 41, 11, 107, 12, 41, 11, 106, 13, 41, 11, 106, 12, 41, 12, 105, 12, 42, 12, 104, 13, 41, 12, 105, 13, 40, 13, 104, 14, 40, 13, 103, 14, 40, 14, 102, 14, 41, 14, 100, 15, 41, 14, 101, 15, 41, 13, 101, 15, 42, 12, 101, 15, 43, 11, 101, 15, 44, 11, 100, 15, 45, 11, 99, 16, 45, 11, 99, 15, 45, 12, 98, 15, 45, 13, 97, 15, 46, 13, 96, 14, 48, 12, 97, 14, 48, 11, 97, 15, 48, 11, 96, 15, 49, 11, 95, 15, 49, 12, 93, 16, 7, 5, 38, 12, 92, 16, 7, 7, 36, 13, 91, 16, 7, 9, 35, 12, 91, 16, 7, 11, 34, 11, 90, 18, 7, 11, 33, 12, 89, 18, 8, 11, 33, 12, 88, 18, 9, 12, 32, 11, 87, 19, 10, 13, 31, 11, 86, 18, 13, 12, 30, 12, 84, 19, 15, 11, 30, 12, 83, 19, 16, 11, 29, 13, 82, 19, 17, 11, 29, 12, 81, 20, 18, 11, 28, 13, 80, 20, 19, 11, 28, 13, 79, 20, 20, 11, 28, 12, 78, 19, 23, 11, 28, 11, 78, 19, 24, 11, 28, 11, 77, 19, 25, 10, 29, 11, 75, 20, 26, 10, 29, 11, 74, 18, 29, 10, 29, 11, 73, 18, 30, 10, 29, 11, 71, 19, 31, 10, 29, 11, 69, 19, 33, 10, 29, 11, 68, 19, 34, 10, 29, 11, 66, 20, 35, 11, 28, 11, 64, 21, 36, 11, 28, 11, 62, 22, 37, 11, 28, 11, 60, 23, 38, 11, 28, 11, 58, 24, 39, 11, 28, 11, 56, 23, 42, 11, 28, 11, 54, 23, 43, 12, 28, 12, 52, 23, 44, 12, 28, 13, 49, 23, 46, 12, 28, 13, 47, 23, 48, 12, 28, 13, 46, 23, 49, 11, 30, 12, 43, 24, 51, 11, 30, 12, 42, 23, 53, 11, 31, 11, 40, 24, 54, 11, 31, 12, 38, 22, 57, 11, 32, 11, 37, 21, 59, 11, 32, 12, 35, 20, 61, 11, 32, 12, 35, 18, 63, 11, 32, 12, 34, 18, 64, 11, 33, 11, 34, 17, 65, 11, 33, 11, 34, 15, 67, 11, 33, 11, 29, 19, 69, 9, 34, 11, 28, 18, 72, 7, 35, 11, 27, 18, 74, 5, 36, 11, 26, 18, 115, 12, 26, 17, 115, 13, 26, 11, 120, 14, 26, 11, 119, 15, 26, 11, 117, 17, 26, 11, 117, 16, 27, 11, 113, 19, 27, 12, 111, 20, 28, 12, 108, 22, 29, 12, 107, 22, 30, 14, 12, 56, 34, 23, 32, 139, 32, 138, 33, 137, 34, 136, 35, 133, 38, 132, 40, 126, 46, 124, 48, 122, 51, 56, 4, 5, 1, 50, 61, 18, 55, 18, 3, 13, 53" left="846" top="350" width="170" height="180" z_order="0">

170 * 180 = 30600 sum(rle) = 30951

171 * 181 is however 30951. This held true for multiple instances that I tested. So... that might be a bug?

AljoSt commented 1 year ago

Just in case someone comes across this: code in python

def rle2Mask(rle: list[int], width: int, height:int)->np.ndarray:

    decoded = [0] * (width * height) # create bitmap container
    decoded_idx = 0
    value = 0

    for v in rle:
        decoded[decoded_idx:decoded_idx+v] = [value] * v
        decoded_idx += v
        value = abs(value - 1)

    decoded = np.array(decoded, dtype=np.uint8)
    decoded = decoded.reshape((height, width)) # reshape to image size

    return decoded

bsekachev commented 1 year ago

So... that might be a bug?

Hmm, let me check that.

bsekachev commented 1 year ago

Probably there is a bug with "height" and "width" values, because 171 * 181 = 30951

So, the issue may be with writing these values to annotation file, because internally we use top, left, right, bottom.

cvat-ai / cvat

Missing mask annotation format example in documentation #5828

My actions before raising this issue