Hi, Thanks for your excellent work. I find there are some wrong or inaccurate implementations in your Refiner.
First, in crop_patch() function (line 207~209 in ./model/refiner.py), your code use torch.unfold twice to unfold input feature map to a series of sliding windows tensor. However, in your paper & code implementation, the idx Tuple is acquired from a errom map of [H/4, W/4] resolution, where H, W denote the original resolution. And the input feature map to be unfolded is at size [H/2, W/2]. To get the same resolution for patch cropping, you set the unfold stride=2 to make a downsample operation. Nevertheless, the result of x.permute(0, 2, 3, 1) .unfold(1, size + 2 * padding, size) .unfold(2, size + 2 * padding, size) is not [B, H/4, W/4, C, patch_size, patch_size], but [B, (H/2-patch_size)/stride + 1, (H/2-patch_size)/stride + 1, C, patch_size, patch_size] (torch.Tensor.unfold), where H/4-3 is not equal to H/4. And the result is not influenced so much since the padding is only 3, but if the padding value is larger, the error is not negligible.
Hi, Thanks for your excellent work. I find there are some wrong or inaccurate implementations in your
Refiner
. First, incrop_patch()
function (line 207~209 in./model/refiner.py
), your code usetorch.unfold
twice to unfold input feature map to a series of sliding windows tensor. However, in your paper & code implementation, theidx
Tuple is acquired from a errom map of [H/4, W/4] resolution, where H, W denote the original resolution. And the input feature map to be unfolded is at size [H/2, W/2]. To get the same resolution for patch cropping, you set the unfold stride=2 to make a downsample operation. Nevertheless, the result ofx.permute(0, 2, 3, 1) .unfold(1, size + 2 * padding, size) .unfold(2, size + 2 * padding, size)
is not [B, H/4, W/4, C, patch_size, patch_size], but [B, (H/2-patch_size)/stride + 1, (H/2-patch_size)/stride + 1, C, patch_size, patch_size] (torch.Tensor.unfold), where H/4-3 is not equal to H/4. And the result is not influenced so much since the padding is only 3, but if the padding value is larger, the error is not negligible.