What's the difference between DeformConv_d and DeformConvPack_d?

XinyiYing / D3Dnet

Repository for "Deformable 3D Convolution for Video Super-Resolution", SPL, 2020

Apache License 2.0

305 stars 43 forks source link

What's the difference between DeformConv_d and DeformConvPack_d? #12

Open zzwei1 opened 3 years ago

zzwei1 commented 3 years ago

Thanks for the repo. I don't understand the difference between DeformConv_d and DeformConvPack_d. I found the main difference in the source code (deform_conv.py) is that, in DeformConvd, offset = temp.clone().resize(b, 81, t, h, w); while in DeformConvPack_d, temp1 = temp.clone()[:,0:81-c,:,:,:] offset = torch.cat((temp.clone(),temp1),dim=1). I don't understand the difference between them. Could you tell me when should I use DeformConv_d and when should I use DeformConvPack_d? By the way, I found there are two .py file in the moudles: deform_conv_seperate.py and deform_conv.py. The question is , what the difference between them, or in other words, when should I use deform_conv_seperate.py and when should I use deform_conv.py? Looking forward to your reply!

XinyiYing commented 3 years ago

Thanks for this comment. Please refer to ./dcn/test.py for example usage. The difference between DeformConv_d and DeformConvPack_d is also illustrated in ./dcn/test.py. Specifically, "DeformConvPack" is used for D3D deforming in three dimensions (using its own offsets); "DeformConv" is used for D3D deform in three dimensions (using extra offsets); "DeformConvPack_d" is used for D3D deforming in option dimensions (using its own offsets); "DeformConv_d" is used for D3D deforming in option dimensions (using extra offsets). We delete ./modules/deform_conv_seperate.py and ./functions/deform_conv_func_seperate.py and update our repo.

zzwei1 commented 3 years ago

Thanks for this comment. Please refer to ./dcn/test.py for example usage. The difference between DeformConv_d and DeformConvPack_d is also illustrated in ./dcn/test.py. Specifically, "DeformConvPack" is used for D3D deforming in three dimensions (using its own offsets); "DeformConv" is used for D3D deform in three dimensions (using extra offsets); "DeformConvPack_d" is used for D3D deforming in option dimensions (using its own offsets); "DeformConv_d" is used for D3D deforming in option dimensions (using extra offsets). We delete ./modules/deform_conv_seperate.py and ./functions/deform_conv_func_seperate.py and update our repo.

Thanks for your quick reply! I have read ./dcn/test.py and I understand the difference between DeformConv_d and DeformConvPack_d now. But here comes another question, what does the 'extra offset' mean? Does it mean that if I want to use DeformConv or DeformConv_d, I need to provide an 'extra offset' by myself ? Is there some formula to explane the extra offset? Thanks again.

XinyiYing commented 3 years ago

Recently, deformable convolution is used to solve the video super-resolution problem. Specifically, Tian et al. [R1] proposed a temporally deformable alignment network (TDAN) for video SR. In their method, the neighboring frames are aligned to the reference frame by deformable convolution. The architecture of the network is shown in Fig. 1.

Fig. 1 The architecture of TDAN.

As is illustrated Fig. 1, the offset is the "extra offset", which is generated by the neighboring frames F_i^LR and the reference frame F_t^LR. Please refer to [R1] for more details. In conclusion, there exist some tasks which need the "extra offset" to guide the deformable convolution. Our code also offer the "extra offset" option for these tasks.

[R1] Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “TDAN: Temporally deformable alignment network for video super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2020.

zzwei1 commented 3 years ago

Recently, deformable convolution is used to solve the video super-resolution problem. Specifically, Tian et al. [R1] proposed a temporally deformable alignment network (TDAN) for video SR. In their method, the neighboring frames are aligned to the reference frame by deformable convolution. The architecture of the network is shown in Fig. 1.

Fig. 1 The architecture of TDAN.

As is illustrated Fig. 1, the offset is the "extra offset", which is generated by the neighboring frames F_i^LR and the reference frame F_t^LR. Please refer to [R1] for more details. In conclusion, there exist some tasks which need the "extra offset" to guide the deformable convolution. Our code also offer the "extra offset" option for these tasks.

[R1] Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “TDAN: Temporally deformable alignment network for video super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2020.

Thanks a lot !