Upsample: differences in paper from implementation

HRNet / HRNet-Object-Detection

Object detection with multi-level representations generated from deep high-resolution representation learning (HRNetV2h). This is an official implementation for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919

Apache License 2.0

644 stars 97 forks source link

Upsample: differences in paper from implementation #45

Open guillaumefrd opened 4 years ago

guillaumefrd commented 4 years ago

In the _make_fuse_layers, the upsampling is done after the 1x1 convolution. However, in the paper the upsampling is done before.

If x > r, f_{xr}(R) upsamples the input representation R through the bilinear upsampling followed by a 1 × 1 convolution for aligning the number of channels.

Moreover, the paper is using bilinear upsampling while the implementation uses with mode='nearest'.

Is there any reasons for these two differences ?