brjathu / iTAML

Official implementation of "iTAML : An Incremental Task-Agnostic Meta-learning Approach". CVPR 2020
96 stars 16 forks source link

about adding parameters #8

Closed JoyHuYY1412 closed 3 years ago

JoyHuYY1412 commented 4 years ago

I am curious about how you get the sum of the parameters of networks of each task. Did you apply any normalization?

brjathu commented 4 years ago

Hi, @JoyHuYY1412, just a average over all the tasks. No we havent used any normalizations.

See https://github.com/brjathu/iTAML/blob/e56e72baf82b1542e589b519e8a13c7301880650/learner_task_itaml.py#L163

JoyHuYY1412 commented 4 years ago

Hi, @JoyHuYY1412, just a average over all the tasks. No we havent used any normalizations.

See

https://github.com/brjathu/iTAML/blob/e56e72baf82b1542e589b519e8a13c7301880650/learner_task_itaml.py#L163

Thank you for your reply. So if different tasks have parameters of different scales, e.g, A>>B, it seems the averaged network will be biased toward A. So when we try to recover the network for task B, we assume the memory samples of B can help fit? I don't know do I understand correctly.

brjathu commented 4 years ago

Yes, that's one reason the classifier for a task is trained only in the inner loop. Also, to minimize the biased model we take a weighted average of the weights as we progress. Yes, finetuning with the memory samples helps to get a better model.

JoyHuYY1412 commented 4 years ago

Yes, that's one reason the classifier for a task is trained only in the inner loop. Also, to minimize the biased model we take a weighted average of the weights as we progress. Thank you so much. I have two more questions.

  1. I read the pseudo code (algorithm 1) in your paper, so after we update phi in line 14, does the theta used in line 7 for task 1 is initialized from the updated phi ? and then task 2 initialized its theta from task 1?
  2. If so, does this operation somehow relieve the imbalance between tasks? Since after each update in the outer loop, the backbone network is reset.
brjathu commented 4 years ago
  1. No, in the inner loop theta for all tasks are initialized with last updated phi, and once we learned all thetas we combine them to get the new phi, which is later used to initialize thetas for the next batch.
  2. Yes, the outer loop meta updated tries to minimize the forgetting, while imbalance is minimized mostly because of the exponential averaging of the weights.