Something strange about the update of theta and psi in the inner loop

genghuanlee commented 4 years ago

Hi Jathushan，

Thanks for your awesome work. Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point. Wonder if i miss something again and could you explain that for me pls?

Best. C

brjathu commented 4 years ago

Thanks. Yes, theta is updated for each task while psi_i only updated for corresponding task. Gradients are not calculated separately as you suspect, once we updated the theta and psi_i, in the inner loop using backward pass, in the outerloop only theta is updated using a weighted average of all task.

genghuanlee commented 4 years ago

Thanks and i get it. And here l have another question. When i make the train and test in Minist, i find something strange about the accuracy after the meta test. The accuracy of the first task reach alomost 100 percent, and the accuracies of the second task , the third task and so on become lower and lower. It make me so confused and i can't find the answer in the code and paper. Can you explain it to me? THANKS

brjathu commented 4 years ago

are you referring to this issue? https://github.com/brjathu/iTAML/issues/10

genghuanlee commented 4 years ago

Thanks. I get it. I have found out the reason of my issue. I wanna transform the method to a new dataset, but the number of pictures belonging to different tasks is unbalanced. Just for this, the accuracies corresponding to different tasks have big gap.

genghuanlee commented 4 years ago

Here, sorry, i have to disturb you again. I have another question.When i read the code as:

main_learner=Learner(model=model,args=args,trainloader=train_loader, testloader=test_loader, use_cuda=use_cuda) main_learner.learn() memory = inc_dataset.get_memory(memory, for_memory)
acc_task = main_learner.meta_test(main_learner.best_model, memory, inc_dataset)

Here, i find the function Learner() doesn't use the memory message. And you use memory in meta-test. And when i read the paper, i find that when you train the model, you use both the new task and memory message. Can you explain it to me.

And my first question about the update of the theta and psi. When i read the code, i am also confused. Here i show the code of the outer update.

        for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
            alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
            ll = torch.stack(reptile_grads[i])
            p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data

Here the p refers to the mode's whole parameter which can't match what you say,'in the outerloop only theta is updated using a weighted average of all task'.

brjathu commented 4 years ago

No worries,

Learner takes the dataloader as the inputs, which is generated from here. task_info, train_loader, val_loader, test_loader, for_memory = inc_dataset.new_task(memory)

the train_loader contains data from both new task and memory.

for the next part, the gradients are calculated only for the part of the fully connected layer. so only the classification parameters for the task will be updated. https://github.com/brjathu/iTAML/blob/e56e72baf82b1542e589b519e8a13c7301880650/learner_task_itaml.py#L148

And, about adding parameters, this was much faster. However, we can use the below part as well.

for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
                alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
                ll = torch.stack(reptile_grads[i])
                if(p.data.size()[0]==10 and p.data.size()[1]==256):
                     for ik in sessions:
                           p.data[2*ik[0]:2*(ik[0]+1),:] = ll[ik[1]][2*ik[0]:2*(ik[0]+1),:]*(alpha) + (1-alpha)* q.data[2*ik[0]:2*(ik[0]+1),:]  
                else:
                     p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data

brjathu / iTAML

Something strange about the update of theta and psi in the inner loop #13