About the testing problem

Nice work, but i found a problem that really confuse me.

As shown in the code omniglot_train_few_shot.py, both in the training and testing phase, the support set (i.e. sample_images) and evaluation set (i.e. test_images) are drawn from the same 5 classes (called one task). And as your way to calculate accuracy, it's easy to achieve ~99% during training.

Here i found a problem and I don't know why ? when I draw support set from one task and draw evaluation set from another task, apparently the two tasks contains different 5 classes.

So I presume that, I will get low confidences after feeding them into the network, but the results are not. Here is the testing case i used:

[TESTING set] CLASS_NUM=5, SAMPLE_NUM_PERCLASS=5 the character classes are: ['Angelic/character11', 'Angelic/character11',' 'Angelic/character11', 'Angelic/character11', 'Angelic/character11', 'Syriac(Serto)/character08', 'Syriac(Serto)/character08', 'Syriac(Serto)/character08', 'Syriac(Serto)/character08', 'Syriac(Serto)/character08', 'Japanese(hiragana)/character42', 'Japanese(hiragana)/character42', 'Japanese(hiragana)/character42', 'Japanese(hiragana)/character42', 'Japanese_(hiragana)/character42', 'Gujarati/character27', 'Gujarati/character27', 'Gujarati/character27', 'Gujarati/character27', 'Gujarati/character27', 'Glagolitic/character09', 'Glagolitic/character09', 'Glagolitic/character09', 'Glagolitic/character09', 'Glagolitic/character09'] [Support set] CLASS_NUM=5, SAMPLE_NUM_PER_CLASS=5 the character classes are: 'N_Ko/character27', 'N_Ko/character27', 'N_Ko/character27', 'N_Ko/character27', 'NKo/character27', 'Japanese(katakana)/character18', 'Japanese(katakana)/character18', 'Japanese(katakana)/character18', 'Japanese(katakana)/character18', 'Japanese(katakana)/character18', 'Oriya/character33', 'Oriya/character33', 'Oriya/character33', 'Oriya/character33', 'Oriya/character33', 'Tibetan/character14', 'Tibetan/character14', 'Tibetan/character14', 'Tibetan/character14', 'Tibetan/character14', 'Tifinagh/character45' 'Tifinagh/character45' 'Tifinagh/character45' 'Tifinagh/character45' 'Tifinagh/character45'

** and i got the output confidences by probs, predict_labels = torch.max(relations.data, 1), as below: 0.9999995, 0.9999894, 0.00067013013, 1.0, 0.9999995, 0.0013619913, 0.45683807, 0.003507328, 0.99994755, 0.20433362, 0.9999981, 0.76437086, 0.4761213, 0.99345946, 0.25436476, 0.0002244339, 0.00026010931, 0.87288016, 1.8067769e-05, 0.00053879694, 1.0, 1.0, 1.0, 1.0, 1.0

It's readly weird, support and testing set have different classes, but the output confidences are so high.

If i randomly pick a image from the entire omniglot and I (assume) don't know its class, if I compare it with all possible support sets, how can I recognize its class, because the output confidences barely has discriminability.

Am i missing anything important or misunderstood ?

floodsung / LearningToCompare_FSL

About the testing problem #6