Closed piyush-das closed 3 years ago
Handling the first layer is difficult because after pruning it, part of it has to go through the residual block and part of it has to go through the shortcut. This puts several constraints and the easiest way out was to just assign a high importance to the first layer so that it is not pruned. This layer is very small and has minimal contribution to compute cost.
Hi,
For the particular case of resnet, it seems that the first layer has been harcoded with a very high importance value as indicated here. https://github.com/EkdeepSLubana/OrthoReg/blob/869f04969dcbd827d25c48246774d243bef355db/main.py#L883 Could you please explain the rational for doing so?
Thanks