Closed ZohrehAdabi closed 2 years ago
Hi @ZohrehAdabi
Can it be a problem related to the batch-norm statistics?
Since the backbone uses batch-norm, you should carefully use the calls to model.train()
and model.eval()
to avoid any update in the batch-norm statistics. Give a look at this discussion for more details.
I've checked it already @mpatacchiola. In correct() function you used .eval():
with torch.no_grad(), gpytorch.settings.num_likelihood_samples(32):
self.model.eval()
self.likelihood.eval()
self.feature_extractor.eval()
z_query = self.feature_extractor.forward(x_query).detach()
if(self.normalize): z_query = F.normalize(z_query, p=2, dim=1)
z_query_list = [z_query]*len(y_query)
predictions = self.likelihood(*self.model(*z_query_list)) #return n_way MultiGaussians
predictions_list = list()
for gaussian in predictions:
predictions_list.append(torch.sigmoid(gaussian.mean).cpu().detach().numpy())
y_pred = np.vstack(predictions_list).argmax(axis=0) #[model, classes]
top1_correct = np.sum(y_pred == y_query)
count_this = len(y_query)
return float(top1_correct), count_this, avg_loss/float(N+1e-10)
also before calling model.test_loop:
model.eval()
acc_mean, acc_std = model.test_loop( novel_loader, return_std = True)
Is there anything else should I check? Thanks.
Hi @ZohrehAdabi
Things I would check are the following:
drop_last
that must be set to False
in test mode.kernel_type
parameter in the config.py file. You could switch to the linear
kernel for instance, which is much simpler than the others and it is unlikely that has been changed in the latest version of the library.Hi @mpatacchiola
I used test.py from here and just added some code for testing two models instead of one, so both models use the same data_loader and have the same transformation (including normalization):
datamgr = SetDataManager(image_size, n_eposide = iter_num, n_query = 15 , **few_shot_params)
if params.dataset == 'cross':
if split == 'base':
loadfile = configs.data_dir['miniImagenet'] + 'all.json'
else:
loadfile = configs.data_dir['CUB'] + split +'.json'
elif params.dataset == 'cross_char':
if split == 'base':
loadfile = configs.data_dir['omniglot'] + 'noLatin.json'
else:
loadfile = configs.data_dir['emnist'] + split +'.json'
else:
loadfile = configs.data_dir[params.dataset] + split + '.json'
novel_loader = datamgr.get_data_loader( loadfile, aug = False)
Based on torch data_loader Docs, drop_last is False by default. In Meta-learning of DKT, there is a for loop on tasks. In each iteration whole data of one task is used for optimization, and all tasks have the same size. Therefore we are using the same data for two tests by novel_dataloader. Is this the case?
and for simplicity, I also used a linear kernel. Thanks.
@ZohrehAdabi I am not sure where the problem can be.
It can be an issue of the dataloader. You could try to comment-out the data manager lines and pass synthetic tasks (e.g. images of random Gaussian noise) that you build in advance. You can just create your own task as a tensor and use it in the two phases to see if the output changes. If the accuracy stays the same this would strongly suggest that the culprit is the data manager.
If the test above gives you the same outcome then something I would try is to use the same code with another model (e.g. ProtoNets) to see if the issue is due to some of the code in DKT.py.
Hi @mpatacchiola Using random tasks like this
tasks = []
for i in range(5):
data_0 = torch.randn([16, 3, 84, 84])
data_1 = torch.randn([16, 3, 84, 84])
data_2 = torch.randn([16, 3, 84, 84])
data_3 = torch.randn([16, 3, 84, 84])
data_4 = torch.randn([16, 3, 84, 84])
data = torch.stack([data_0, data_1, data_2, data_3, data_4])
tasks.append(data)
and using model.correct(tasks[i])
accuracies stay the same.
Thank you. I test each model separately to use the same data from data_loader randomness and have a safe test.
Do you think why data_manager creates such an issue?
There is another problem with ACC! I define two models at the same time,
elif params.method == 'DKT':
last_model = DKT(model_dict[params.model], **few_shot_params)
best_model = DKT(model_dict[params.model], **few_shot_params)
and in the remainder of the code by using boolean variables best and last controlled which models run. By using random tasks, there is no change in ACCs when I change the sequence of tests for best_model and last model. But, when I comment definition of one model like this:
elif params.method == 'DKT':
#last_model = DKT(model_dict[params.model], **few_shot_params)
best_model = DKT(model_dict[params.model], **few_shot_params)
I get a higher ACC for the other model.
Why definition of the models affects the test? [I used the random tasks and this happens.](models have different id
and use their own stat_dicts as before.)
The data_manager seems to be the issue for the previous problem.
For the second problem with the two models, are you loading the same pretrained models for both objects or did you initialize them from scratch?
Hi @mpatacchiola I initialize them from scratch:
#best, last = True, True
best, last = True, False
if best and best_modelfile is not None:
best_model = best_model.cuda()
tmp = torch.load(best_modelfile)
best_model.load_state_dict(tmp['state'])
if last and last_modelfile is not None:
last_model = last_model.cuda()
tmp = torch.load(last_modelfile)
last_model.load_state_dict(tmp['state'])
the run test_loop()
. When last_model and best_model are Instantiated but only best_model is initialized and best_model.test_loop()
is run, then I get some accuracy for best_model. But when, in another run of test.py
, the last_model is commented, best_model has a different (higher) accuracy. Can creating an Instance of a model affect the other Instances of that?
Thanks.
Hi @mpatacchiola I'm using DKT code for classification on CUB [5 way- 1 shot]. I save two models during meta-training, best_model and last_model. in test.py I add some code to test best_model and last_model. I create two instantiates of DKT class:
and load saved files for each of them:
when I run test.py for these models:
I have a problem with ACC. If I run best_model or last model alone, I get some ACCs; If I run both of them (best, last= True, True) I get different ACC for the model that is run in the second step. (running last_model after best_model test, changes the ACC of last_model.)
Could you please help me to figure out what the problem is? I really appreciate any help you can provide.