about the relu function before pooling

arthurdouillard / incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).

MIT License

383 stars 60 forks source link

about the relu function before pooling #24

Closed liujianzhao6328057 closed 3 years ago

liujianzhao6328057 commented 3 years ago

Hi, thanks for your great work. In your paper, you said that "We remove the ReLU activation at the last block of each ResNet " , but i have read the code of "my_resnet.py" and found that you remove every last ReLU activation in each block of ResNet.

arthurdouillard commented 3 years ago

Hey, thanks for your interest in our work.

ReLU is removed at the very last block because we need the full amplitude (-neg and +pos) for the cosine classifier (pretty much everyone does that)
ReLU is also removed for the end of the other blocks because it's work better for the distillation losses POD. There are some papers & blog articles that remarked that removing these ReLUs doesn't change the performance of a ResNet.

Is that clearer? :)