MIT-Princeton Vision Toolbox for Robotic Pick-and-Place at the Amazon Robotics Challenge 2017 - Robotic Grasping and One-shot Recognition of Novel Objects with Deep Learning.
Thanks for providing these great data and resources.
I have tried to understand better how the KvN accuracy is first computed in your script ./image-matchin/evaluateModel.m and then also reused to choose between K-net and N-net in your second script evaluateTwoStage.m
The specific portion of code I am confused about is:
bestKnownNovelAcc = 0;
bestKnownNovelThreshold = 0;
for threshold = 0:0.01:1.2
knownNovelAcc = sum((predNnDist > threshold) == ~testIsKnownObj)/length(testIsKnownObj);
if bestKnownNovelAcc < knownNovelAcc
bestKnownNovelAcc = knownNovelAcc;
bestKnownNovelThreshold = threshold;
end
end
My understanding is that your optimal threshold is chosen to maximise a metric
based on your knowledge of the ground truth labels in the test set (stored as testIsKnownObj).
Isn't that the same as assuming that you already know if the observed/grasped image in the test set is known or not, even before predicting its class with either of the two networks? Am I missing something here?
How would one decide between K-net or N-net (i.e., conclude the so-called "recollection stage" in your paper) without access to the ground truth results then?
Thanks for providing these great data and resources.
I have tried to understand better how the KvN accuracy is first computed in your script
./image-matchin/evaluateModel.m
and then also reused to choose between K-net and N-net in your second scriptevaluateTwoStage.m
The specific portion of code I am confused about is:
My understanding is that your optimal threshold is chosen to maximise a metric based on your knowledge of the ground truth labels in the test set (stored as
testIsKnownObj
). Isn't that the same as assuming that you already know if the observed/grasped image in the test set is known or not, even before predicting its class with either of the two networks? Am I missing something here? How would one decide between K-net or N-net (i.e., conclude the so-called "recollection stage" in your paper) without access to the ground truth results then?