hi, I read your code and I have two questions
the first one: in this paper, the author didn't mentioned to add the dim of output when training on new tasks, but your code added nodes for fully conneted layer when training on new tasks.
the second one: in this paper, classification loss has two parts, one is about traditional CE Loss, and another is the CE Loss with none target labels and "1 - prediction".
Thanks for your attention!
hi, I read your code and I have two questions the first one: in this paper, the author didn't mentioned to add the dim of output when training on new tasks, but your code added nodes for fully conneted layer when training on new tasks. the second one: in this paper, classification loss has two parts, one is about traditional CE Loss, and another is the CE Loss with none target labels and "1 - prediction". Thanks for your attention!