BGU-CS-VIL / WTConv

Wavelet Convolutions for Large Receptive Fields. ECCV 2024.
MIT License
264 stars 11 forks source link

Question of Eq, 8 of the papaer. #22

Closed caixd-220529 closed 2 weeks ago

caixd-220529 commented 2 weeks ago

The equation 8 of the paper is:

$Z^{(i)}=\mathrm{IWT}(Y{LL}^{(i)}+Z^{(i+1)},Y{H}^{(i)})$

However, the information of the approimation of the current level ($Y_{LL}^{(i)}$) is already included in the decomposition of it. Why the paper use Eq 8 instead of:

$Z^{(i)}=\mathrm{IWT}(Z^{(i+1)},Y_{H}^{(i)})$

which is more similar to the standard inverese wavelet transform.

shahaffind commented 2 weeks ago

The motivation is to sum up different operations that operate on different receptive field. Similar to how RepLK sums up the outputs of a large kernel conv and a small kernel conv. We found it to be more effective this way.

caixd-220529 commented 2 weeks ago

Is there a serious performance drop if Eq 8 is replaced by $Z^{(i)}=\mathrm{IWT}(Z^{(i+1)},Y_{H}^{(i)})$ in your experiment? I am not an expert on comper vision. The experiment setup may cost me lots of time.... 😢.

shahaffind commented 2 weeks ago

In early versions of the layer's architecture, it didn't perform as well, but I did not try this on a larger scale (such as the ablation study in Table 8)