After some correspondence with the authors it turns out that the skip connections are parameterized by a 1x1 convolution, separate from the one that goes into the add block. This should be an easy fix but could potentially explain poor training results before.
After some correspondence with the authors it turns out that the skip connections are parameterized by a 1x1 convolution, separate from the one that goes into the add block. This should be an easy fix but could potentially explain poor training results before.