making own COCO data: what does visible mean?

cchamber commented 6 years ago

I have a general question about the labelling in the COCO dataset. For the keypoints, the ordering of labels is [x,y,visible,x,y,visible...] What does a label of visibility=1 mean? This clearly covers occlusion, but does it also refer to parts which are not visible because of blur? In my case I am dealing with frames of videos and joints are often blurred because of movement. Thank you!!

salihkaragoz commented 6 years ago

@cchamber if visibility == 0 that keypoint not in the image. if visibility == 1 that keypoint is in the image BUT not visible namely maybe behind of an object. if visibility == 2 that keypoint looks clearly. not hidden.

mgarbade commented 5 years ago

Is visibility == 1 still used for training and evaluation?

I wonder, whether to annotate the occluded and self-occluded keypoints.

Looking at some COCO person keypoint examples, it looks like:

Self-occluded body parts are either not annotated at all ("0") or marked as visible ("2").
Keypoints occluded by another person or object are marked as "1"

Screenshot from 2019-09-24 11-36-46

Screenshot from 2019-09-24 11-34-03

mgarbade commented 5 years ago

Here is the description:

" ... the visibility flags of the ground truth (the detector's predicted visibility [is] not used)... These similarities are averaged over all labeled keypoints (keypoints for which visibility > 0). Predicted keypoints that are not labeled (visibility=0) do not affect the [Evaluation]"

sampepose commented 5 years ago

Ground truth visibility is used for training and evaluation.

During training, only ground truth visible keypoints (v > 0) are included in the loss.

https://github.com/facebookresearch/Detectron/blob/8170b25b425967f8f1c7d715bea3c5b8d9536cd8/detectron/utils/keypoints.py#L181 https://github.com/facebookresearch/Detectron/blob/8170b25b425967f8f1c7d715bea3c5b8d9536cd8/detectron/roi_data/keypoint_rcnn.py#L75-L91 https://github.com/facebookresearch/Detectron/blob/8170b25b425967f8f1c7d715bea3c5b8d9536cd8/detectron/modeling/keypoint_rcnn_heads.py#L122-L127

Only ground truth visible keypoints (v > 0) are included in OKS calculations. https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L230-L231

In other words, predictions for keypoints labeled v = 0 are not penalized or rewarded in either the loss or the metrics.

It would make sense to label self-occluded points with 1 (or even add a new class), but I don't think COCO enforced this.

soulslicer commented 5 years ago

are self-occluded keypoints going to have 1 as the visibility flag?

jc-hou commented 4 years ago

Same question! recommended visibility flag for blurred or self-occlusion joint?

btalberg commented 4 years ago

Hey all! I'm also wrestling with uncertainty on how/when to use the visibility flags. Some use cases I'm unsure about (I'm focused here on noses):

Category	Visibility	Use Case
1. "Soft" Self-Occlusion	2?	A person's own hair is occluding their nose.
2. "Medium" Self-Occlusion	0?	A person's hand is occluding their own nose.
3. "Hard" Self-Occlusion	0?	A person's head is turned away from the camera so the back of their head is occluding their nose.
4. Other Person Occlusion	1?	A person's nose if occluded by another person's hand
5. Wearable Occlusion	1?	A person's nose if occluded by a wearable like a mask
6. External Object Occlusion	1?	A person's nose if occluded by an object like a tree branch or a car's sun visor when viewing the person through the windshield
7. Blur	2?	A face is present in the distance and I can guess the location of a keypoint like nose, but the image is too blurry for me to clearly denote the nose.
8. Low Exposure	2?	A face is present and I can guess the location of a keypoint like nose, but the image is under exposed (dark) so I can barely make out the nose.

Any insights from those who have struggled themselves with applying visibility labels? @cchamber - had you come up with a consistent definition that you went by?

mgarbade commented 4 years ago

Some thoughts: Why not simply annotate everything that a human annotator can guess, including occluded or blurred keypoints?

Disadvantages:

A network would have a harder time learning the position of these occluded keypoints as it has to guess their position from context.
Having too many corresponding left / right keypoints in the same location ( for example left ear and right ear in case of a person standing sideways to the camera) might leed to a network which tends to confuse left and right. (not sure if this is really an issue)

Advantage:

In most cases the full pose information of a person is desirable anyway, so annotating it might help here
Full annotations might help for future training of 3D pose estimation networks based on 2D annotations (for exampling when computing a loss based on the projection of the estimated 3D pose back to 2D)

AmeurSoualmi commented 2 years ago

@cchamber I have seen your great work on infants dataset, and I wonder how did you solve the visibility flag problem? thnx

cocodataset / cocoapi

making own COCO data: what does visible mean? #130