andyzeng / arc-robot-vision

MIT-Princeton Vision Toolbox for Robotic Pick-and-Place at the Amazon Robotics Challenge 2017 - Robotic Grasping and One-shot Recognition of Novel Objects with Deep Learning.
http://arc.cs.princeton.edu
Apache License 2.0
295 stars 97 forks source link

KvN accuracy computation #10

Open achiatti opened 4 years ago

achiatti commented 4 years ago

Thanks for providing these great data and resources.

I have tried to understand better how the KvN accuracy is first computed in your script ./image-matchin/evaluateModel.m and then also reused to choose between K-net and N-net in your second script evaluateTwoStage.m

The specific portion of code I am confused about is:

bestKnownNovelAcc = 0;
bestKnownNovelThreshold = 0;
for threshold = 0:0.01:1.2
    knownNovelAcc = sum((predNnDist > threshold) == ~testIsKnownObj)/length(testIsKnownObj);
    if bestKnownNovelAcc < knownNovelAcc
        bestKnownNovelAcc = knownNovelAcc;
        bestKnownNovelThreshold = threshold;
    end
end

My understanding is that your optimal threshold is chosen to maximise a metric based on your knowledge of the ground truth labels in the test set (stored as testIsKnownObj). Isn't that the same as assuming that you already know if the observed/grasped image in the test set is known or not, even before predicting its class with either of the two networks? Am I missing something here? How would one decide between K-net or N-net (i.e., conclude the so-called "recollection stage" in your paper) without access to the ground truth results then?