Open austinapatel opened 1 year ago
There is nothing to "handle" in terms of running the evaluation per se, since we have ground-truth 3D pose.
In terms of whether the task is well-posed, the answer should be yes as long as: (1) your object has a fixed scale (i.e. you don't have the same Cheez-It box in different sizes), (2) there is only one camera model used throughout the dataset (i.e. one universal camera intrinsics), and (3) you have a training set to "memorize" the 6D pose labels. This happens to be the case for the BOP datasets (at least in my impression), and also the DexYCB dataset from this repo.
I'm looking at the 3 error metrics for the BOP challenge, and it appears that 2 of them rely on absolute position to produce valid scores. I see that there are RGB only methods for obj pose evaluation in the paper. How is the depth ambiguity associated with RGB only methods handled with respect to the evaluations? Thanks!