Open mstaczek opened 2 years ago
Hey @mstaczek
the truth is, DenseNet is quite tricky with LRP in the sense that the DenseLayers within the DenseBlocks end with a linear layer without an activation, and start with a BatchNorm. This means that, with the residual connections, multiple BatchNorm layers are connected to multiple Linear layers, so they cannot be merged into the linear layer as it is normally done.
Image from the paper:
Text representation of the torchvision model:
Using DenseNet without canonizers does not work correctly:
You can use the Epsilon rule in BatchNorm layers for slightly better results:
Here's some code to produce heatmaps with densenet121:
Ultimately, a canonizer needs to be implemented also for DenseNet, due to its problematic BatchNorms. We cannot merge the BatchNorms into the adjacent linear layer, since multiple BatchNorms use the same linear layer, and we cannot merge the adjacent linear layer into the BatchNorms, since the BatchNorm is not expressive enough.
There are a few settings of BatchNorms that need different handling:
ResNetCanonizer
, and then setting the connected linear layers to become the identity. This is probably very involved.There may be things that I overlooked, but LRP for DenseNet, is, by the design of LRP, currently quite a challenge to get right. It needs careful thinking in order to be done as implied by the definition of LRP.
I will try to discuss this in our Lab and see if there's a better solution, but maybe until then you can try the Epsilon rule for BatchNorm layers.
Wow, I did not expect it to be such a challenge!
Thank you for the explanation and sample heatmaps. They really help to convince that a custom canonizer is necessary for DenseNets. I will think about implementing it after reading more about DenseNet, it's blocks and LRP.
I wanted to use LRP with DesneNet121 from torchvision. So far, in zennit.torchvision I found canonizers for ResNet and VGG and I wonder if I may use them (to get some results) or I need to write my own custom canonizer (because the network has some new layers that were not covered by previous canonizers?).
Thanks for your help!