Results far worse than reported

Reproduced results far worse than reported in paper Specifically, on mini-ImageNet with ResNet12 backbone, I get 52.38 for 1-shot SSL, and 67.02 for 5-shot SSL, compared to Table 4 in the paper 75.36 for 1-shot SSL, and 84.07 for 5-shot SSL. The performance gap is huge, which I find very confused since I have not changed any provided code. I use the pkl file provided in readme and the pretrained & finetuned model you provided here. I use episodic_miniimagenet_resnet12_1-shot.tar.bz2, which I believe is the correct pretrained model. Here is the exp_dict.json I get. I doubt if the pretrained model you provide is valid. Or, maybe the pkl file gives different input image data from the npz file in your original code, which causes that pretrained model cannot adapt? Actually, when I follow your data processing pipeline, and use the TADAM pre-processing on a more well-known format of mini-ImageNet (which contains images.zip), the result given by pretrained model is still low. Which form of input data is your pretrained model actually suitable for? Also, for mini-ImageNet with WRN backbone, I find similar problem with Issue12, where the result is about 5% lower than reported for 1-shot. I don't know whether the problem comes from dataset or pretrained model.
Where to put pretrained model? It is unclear where I should put the provided pretrained model. Take mini-ImageNet+ResNet12+1-shot for example (1) logs/finetuning/ Files score_list_best.pkl, checkpoint_best.pth, exp_dict.json all lie in folder logs/finetuning. "pretrained_weights_root" is changed to './logs/finetuning'. However, the program will fail to find pretrained model (2) logs/finetuning/episodic_miniimagenet_resnet12_1-shot/ "pretrained_weights_root" is also changed to './logs/finetuning'. The program can run. However, if there exists other folders under logs/finetuning/, e.g. logs/finetuning/episodic_miniimagenet_wrn_1-shot/, logs/finetuning/episodic_miniimagenet_resnet12_5-shot/, etc. the program will search for every pretrained model under logs/finetuning and attempt to load all of them. While I change the ssl_exps.py to only run one experiment mini-ImageNet+ResNet12+1-shot, the code seems to also load wrn models and output dimension mismatch error. Moreover, the parameter 'finetuned_weights_root' is for what? It is also set as "./logs/finetuning", which is the same as "pretrained_weights_root".
How to reproduce the results in Table 1 in the paper, or the table in README, where NO unlabeled sample is used? The parameter "unlabeled_size_test" should be set to 0, but simply changing the value will bring an error related to 'NoneType', which should be caused by extracting empty slice in a list. Could you provide a runnable program to reproduce Table 1?

ServiceNow / embedding-propagation

Results far worse than reported #29