Closed caixd-220529 closed 2 weeks ago
The motivation is to sum up different operations that operate on different receptive field. Similar to how RepLK sums up the outputs of a large kernel conv and a small kernel conv. We found it to be more effective this way.
Is there a serious performance drop if Eq 8 is replaced by $Z^{(i)}=\mathrm{IWT}(Z^{(i+1)},Y_{H}^{(i)})$ in your experiment? I am not an expert on comper vision. The experiment setup may cost me lots of time.... 😢.
In early versions of the layer's architecture, it didn't perform as well, but I did not try this on a larger scale (such as the ablation study in Table 8)
The equation 8 of the paper is:
$Z^{(i)}=\mathrm{IWT}(Y{LL}^{(i)}+Z^{(i+1)},Y{H}^{(i)})$
However, the information of the approimation of the current level ($Y_{LL}^{(i)}$) is already included in the decomposition of it. Why the paper use Eq 8 instead of:
$Z^{(i)}=\mathrm{IWT}(Z^{(i+1)},Y_{H}^{(i)})$
which is more similar to the standard inverese wavelet transform.