Why BCE is used instead of CE with Softmax?

brjathu / iTAML

Official implementation of "iTAML : An Incremental Task-Agnostic Meta-learning Approach". CVPR 2020

96 stars 16 forks source link

Why BCE is used instead of CE with Softmax? #11

Closed JoyHuYY1412 closed 3 years ago

JoyHuYY1412 commented 4 years ago

Each task seems to be a multi-class classification, so why not using nn.CrossEntropyLoss?

brjathu commented 4 years ago

To minimize catastrophic forgetting, it's better not to pull down the previously learned distributions, even if they are not present in the training data.