Fig 2/ Eg(3) and Code Conflict

deepernewbie commented 3 years ago

In the paper Fig 2 and Corresponding Eq(3) After the concat operation of features of the supporting and reference frames and bottleneck conv layer this tensor is used to calculate the offsets "only" operating on supporting frame (as dictated by Eq 3) However your code uses bottleneck output as an input to the first deformable conv block. Any comment on this?

YapengTian commented 3 years ago

The descriptions in the paper are consistent with the code. During deformable alignment, the concatenated feature from both supporting and reference frames are only used to generate offsets (sampling positions) for performing convolutions over the supporting frame, and the features from reference frame will not be used to reconstruct the aligned frame. Note that deformable alignment consists of two parts: offset generation and deformable convolution. The concatenated features are only used in the first part.

See L203: fea = (self.deconv_3(supp, offset3)) -> only features sampled from the supporting frame will be used to reconstruct the aligned frame. The early several deformable convolution layers are only used to transform the concatenated features for predicting the offset3. Thus, these deformable convolutions are not performing deformable alignment (just similar as the regular convolutions). In table 5, we have one ablation study to discuss the model with different numbers of dconv layers https://openaccess.thecvf.com/content_CVPR_2020/papers/Tian_TDAN_Temporally-Deformable_Alignment_Network_for_Video_Super-Resolution_CVPR_2020_paper.pdf.

deepernewbie commented 3 years ago

Thanks it is much clear now especially with your comment "The early several deformable convolution layers are only used to transform the concatenated features for predicting the offset3"

YapengTian / TDAN-VSR-CVPR-2020

Fig 2/ Eg(3) and Code Conflict #59