cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.48k stars 2.99k forks source link

serverless result formats #6332

Closed patrickwasp closed 1 year ago

patrickwasp commented 1 year ago

where would I find information about the format serverless functions should return for automatic annotation? What "types" are available, and what are the formats CVAT expects for each of them?

Here are what I found by looking at the examples in the serverless folder, I'm not sure if my interpretation is right:

instance segmentation mask_rcnn

"confidence": a number between 0 and 1,
"label": the string representation of the class name,
"points": a list of points representing a single polygon (x1, y1, x2, y2, x3, y3, ..., xn, yn), 
"mask": a list of 0 and 1 representing a binary mask cropped around the object, with the last four elements representing the top left and bottom right coordinates of the object's bounding box, (x_top_left, y_top_left, x_bottom_right, y_bottom_right)
"type": "mask",

object detection detectron2 retinanet

"confidence": a number between 0 and 1,
"label": the string representation of the class name,
"points": a list of 4 points representing the top left and bottom right coordinates of the object's bounding box, (x_top_left, y_top_left, x_bottom_right, y_bottom_right)
"type": "rectangle",

image embeddings sam

"blob": image embeddings stored as a base64 string

where the embeddings are of shape 1xCxHxW, where C is the embedding dimension and (H,W) are the embedding spatial dimension of SAM (typically C=256, H=W=64).

bsekachev commented 1 year ago

What "types" are available, and what are the formats CVAT expects for each of them?

CVAT types: rectangle, polygon, points, polyline, ellipse, mask, tag and cuboid (the latest two, need to re-check). rectangle: [xtl, ytl, xbr, ybr] polygon, points, polyline: [x1, y1, x2, y2, x3, y3, ... ] ellipse: probably [cx, cy, right x, top y] mask: [RLE-encoded ROI, xtl, ytl, xbr, ybr] where the latest 4 are ROI coordinates

how would we represent an object with multiple shapes, for example when there is an occlusion in the middle of it? Can "points" be a two-dimensional list?

Currently only with masks. Multi-dimensional list is not supported. It could be enhancement. See #3676

do we need points and mask data for type "mask"?

As far as I remember for type "mask" mask is only obligatory. Client will convert it to polygon using OpenCV if necessary

SAM output additionally handled by sam plugin on client side (cvat-ui/plugins/sam).

bsekachev commented 1 year ago

For mask I was wrong. This is not RLE-encoded. Option you suggested is correct.