Closed patrickwasp closed 1 year ago
What "types" are available, and what are the formats CVAT expects for each of them?
CVAT types: rectangle, polygon, points, polyline, ellipse, mask, tag and cuboid (the latest two, need to re-check). rectangle: [xtl, ytl, xbr, ybr] polygon, points, polyline: [x1, y1, x2, y2, x3, y3, ... ] ellipse: probably [cx, cy, right x, top y] mask: [RLE-encoded ROI, xtl, ytl, xbr, ybr] where the latest 4 are ROI coordinates
how would we represent an object with multiple shapes, for example when there is an occlusion in the middle of it? Can "points" be a two-dimensional list?
Currently only with masks. Multi-dimensional list is not supported. It could be enhancement. See #3676
do we need points and mask data for type "mask"?
As far as I remember for type "mask" mask is only obligatory. Client will convert it to polygon using OpenCV if necessary
SAM output additionally handled by sam plugin on client side (cvat-ui/plugins/sam
).
For mask I was wrong. This is not RLE-encoded. Option you suggested is correct.
where would I find information about the format serverless functions should return for automatic annotation? What "types" are available, and what are the formats CVAT expects for each of them?
Here are what I found by looking at the examples in the serverless folder, I'm not sure if my interpretation is right:
instance segmentation
mask_rcnnobject detection
detectron2 retinanetimage embeddings
samwhere the embeddings are of shape 1xCxHxW, where C is the embedding dimension and (H,W) are the embedding spatial dimension of SAM (typically C=256, H=W=64).