Closed monacv closed 3 years ago
The ambiguity here is defined as: for n images of "catch frisbee'' shot in different viewpoints, the model trained under supervised learning would get confused when facing such diverse 2D patterns but the same semantics. In 3D, this ambiguity from different viewpoints would be largely alleviated.
We built the ambiguous-HOI benchmark by comparing the similarities of 2D human poses of the same HOIs and choose the ones as different as possible (very confusing in 2D), to test the de-ambiguity ability of models. You could find the images with the same HOI (like "catch frisbee") for a more intuitive comparison.
More please also refer to our paper (fig 1, sec. 5.1, sec. E of the supplementary material).
Could you be a bit more specific about what constitutes an ambiguous HOI? For example, in the image below, I am not sure why is it ambiguous?
Thanks a bunch, Mona