cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
11.9k stars 2.9k forks source link

Missing mask annotation format example in documentation #5828

Closed AljoSt closed 1 year ago

AljoSt commented 1 year ago

My actions before raising this issue

First of all, thanks for maintaining such a great tool! I was trying out the new brush tool and got stuck figuring out the format of the mask in the annotations file (CVAT for images 1.1). The format documentations provide examples for polygons, polylines, etc but lacks examples for masks. As the format is not super intuitive (not 100% sure what "rle" is and what "left", "top", etc are for) it would be good if that could be added. Happy to do it, if someone provides an explanation.

Edit 1: Googling would have helped me at least understanding what RLE is: https://en.wikipedia.org/wiki/Run-length_encoding Might still make sense to put at least a link into the docs. Edit 2: All the RLEs that I get in my annotations have an uneven number of elements, which contradicts all the explanations that I found for RLE. Is this a bug or is this a different RLE version?

bsekachev commented 1 year ago

Hi @AljoSt

Thank you for question. Adding example to documentation makes sence.

export function rle2Mask(rle: number[], width: number, height: number): number[] {
    const decoded = Array(width * height).fill(0); // create bitmap container
    const { length } = rle;  
    let decodedIdx = 0;
    let value = 0;
    let i = 0;

    while (i < length) {
        let count = rle[i];   // get sequence length of ``0`` or ``1``
        while (count > 0) {   // write to result container ``0`` or ``1`` 
            decoded[decodedIdx] = value;
            decodedIdx++;
            count--;
        }
        i++;  
        value = Math.abs(value - 1); // inverse 0 <--> 1
    }

    return decoded;
}
AljoSt commented 1 year ago

Thanks for the explanation. It looks like you are creating an unrolled image. Then you set the first rle[0] elements to 0, then you set the following rle[1] elements to 1, the following rle[2] elements to 0 etc (?). Which means that the sum or the elements in rle should be the same as width * height? This doesn't work for me. Here is an example:

<mask label="test_label" source="manual" occluded="0" rle="130, 28, 142, 30, 140, 32, 138, 34, 135, 37, 133, 39, 131, 41, 129, 43, 128, 44, 126, 46, 124, 47, 123, 16, 19, 14, 121, 15, 22, 14, 120, 14, 24, 13, 120, 13, 26, 12, 119, 13, 28, 11, 119, 12, 29, 11, 118, 12, 30, 11, 117, 13, 30, 12, 116, 13, 30, 12, 116, 12, 32, 11, 115, 12, 33, 11, 114, 13, 33, 11, 114, 12, 34, 12, 112, 12, 35, 12, 112, 12, 36, 11, 112, 12, 36, 11, 112, 11, 37, 11, 111, 12, 37, 11, 110, 12, 37, 12, 110, 11, 38, 12, 110, 11, 38, 12, 109, 12, 38, 12, 109, 12, 38, 11, 110, 11, 39, 11, 110, 11, 39, 11, 109, 12, 39, 11, 109, 11, 40, 11, 109, 11, 40, 11, 109, 11, 39, 12, 108, 12, 39, 12, 108, 11, 40, 12, 107, 12, 39, 13, 107, 12, 39, 12, 107, 13, 39, 11, 108, 12, 40, 11, 108, 12, 40, 11, 108, 12, 39, 12, 107, 13, 39, 12, 107, 12, 40, 11, 108, 11, 41, 11, 107, 12, 41, 11, 107, 11, 41, 12, 107, 11, 41, 12, 107, 11, 41, 12, 107, 11, 40, 12, 108, 11, 40, 11, 108, 12, 40, 11, 108, 11, 41, 11, 108, 11, 40, 12, 107, 12, 40, 12, 106, 13, 40, 12, 106, 12, 40, 13, 106, 11, 41, 12, 106, 12, 40, 12, 107, 12, 40, 12, 107, 12, 39, 13, 106, 13, 39, 13, 106, 12, 39, 14, 106, 11, 40, 13, 107, 11, 39, 13, 108, 11, 39, 12, 108, 12, 39, 11, 108, 13, 39, 11, 108, 12, 39, 12, 108, 11, 40, 12, 108, 11, 40, 11, 109, 11, 39, 12, 108, 12, 39, 12, 107, 13, 39, 12, 107, 12, 40, 12, 107, 11, 40, 13, 106, 12, 40, 12, 106, 13, 40, 11, 107, 12, 40, 12, 106, 12, 41, 11, 107, 12, 41, 11, 107, 12, 41, 11, 106, 13, 41, 11, 106, 12, 41, 12, 105, 12, 42, 12, 104, 13, 41, 12, 105, 13, 40, 13, 104, 14, 40, 13, 103, 14, 40, 14, 102, 14, 41, 14, 100, 15, 41, 14, 101, 15, 41, 13, 101, 15, 42, 12, 101, 15, 43, 11, 101, 15, 44, 11, 100, 15, 45, 11, 99, 16, 45, 11, 99, 15, 45, 12, 98, 15, 45, 13, 97, 15, 46, 13, 96, 14, 48, 12, 97, 14, 48, 11, 97, 15, 48, 11, 96, 15, 49, 11, 95, 15, 49, 12, 93, 16, 7, 5, 38, 12, 92, 16, 7, 7, 36, 13, 91, 16, 7, 9, 35, 12, 91, 16, 7, 11, 34, 11, 90, 18, 7, 11, 33, 12, 89, 18, 8, 11, 33, 12, 88, 18, 9, 12, 32, 11, 87, 19, 10, 13, 31, 11, 86, 18, 13, 12, 30, 12, 84, 19, 15, 11, 30, 12, 83, 19, 16, 11, 29, 13, 82, 19, 17, 11, 29, 12, 81, 20, 18, 11, 28, 13, 80, 20, 19, 11, 28, 13, 79, 20, 20, 11, 28, 12, 78, 19, 23, 11, 28, 11, 78, 19, 24, 11, 28, 11, 77, 19, 25, 10, 29, 11, 75, 20, 26, 10, 29, 11, 74, 18, 29, 10, 29, 11, 73, 18, 30, 10, 29, 11, 71, 19, 31, 10, 29, 11, 69, 19, 33, 10, 29, 11, 68, 19, 34, 10, 29, 11, 66, 20, 35, 11, 28, 11, 64, 21, 36, 11, 28, 11, 62, 22, 37, 11, 28, 11, 60, 23, 38, 11, 28, 11, 58, 24, 39, 11, 28, 11, 56, 23, 42, 11, 28, 11, 54, 23, 43, 12, 28, 12, 52, 23, 44, 12, 28, 13, 49, 23, 46, 12, 28, 13, 47, 23, 48, 12, 28, 13, 46, 23, 49, 11, 30, 12, 43, 24, 51, 11, 30, 12, 42, 23, 53, 11, 31, 11, 40, 24, 54, 11, 31, 12, 38, 22, 57, 11, 32, 11, 37, 21, 59, 11, 32, 12, 35, 20, 61, 11, 32, 12, 35, 18, 63, 11, 32, 12, 34, 18, 64, 11, 33, 11, 34, 17, 65, 11, 33, 11, 34, 15, 67, 11, 33, 11, 29, 19, 69, 9, 34, 11, 28, 18, 72, 7, 35, 11, 27, 18, 74, 5, 36, 11, 26, 18, 115, 12, 26, 17, 115, 13, 26, 11, 120, 14, 26, 11, 119, 15, 26, 11, 117, 17, 26, 11, 117, 16, 27, 11, 113, 19, 27, 12, 111, 20, 28, 12, 108, 22, 29, 12, 107, 22, 30, 14, 12, 56, 34, 23, 32, 139, 32, 138, 33, 137, 34, 136, 35, 133, 38, 132, 40, 126, 46, 124, 48, 122, 51, 56, 4, 5, 1, 50, 61, 18, 55, 18, 3, 13, 53" left="846" top="350" width="170" height="180" z_order="0">

170 * 180 = 30600 sum(rle) = 30951

171 * 181 is however 30951. This held true for multiple instances that I tested. So... that might be a bug?

AljoSt commented 1 year ago

Just in case someone comes across this: code in python

def rle2Mask(rle: list[int], width: int, height:int)->np.ndarray:

    decoded = [0] * (width * height) # create bitmap container
    decoded_idx = 0
    value = 0

    for v in rle:
        decoded[decoded_idx:decoded_idx+v] = [value] * v
        decoded_idx += v
        value = abs(value - 1)

    decoded = np.array(decoded, dtype=np.uint8)
    decoded = decoded.reshape((height, width)) # reshape to image size

    return decoded
bsekachev commented 1 year ago

So... that might be a bug?

Hmm, let me check that.

bsekachev commented 1 year ago

Probably there is a bug with "height" and "width" values, because 171 * 181 = 30951

So, the issue may be with writing these values to annotation file, because internally we use top, left, right, bottom.