Closed XiaoqiangZhou closed 4 years ago
The sampling kernel does not have learnable parameters. It is a Gaussian kernel with fixed variance and kernel size. We tried to train the variance of the Gaussian kernel as a learnable parameter. But we find that the training process will be unstable. We use the sampling correctness loss to train our model. This loss can determine whether the current sampled regions are ”good” choices. It helps the flow fields find the correct sampling regions.
Because the smooth method is a MATLAB code, it is a troublesome thing to call this code in python every time. Therefore, we directly use input_corrupted_structure_image = mask * (smooth(gt_image)) to calculate the input structures in our implementation for computational efficiency. We will try to convert this code to python when we finish some urgent tasks at hand.
@RenYurui Thanks for your quick and detailed relpy!
I have got your explanation for the second question.
But I'm still a little confused with "It helps the flow fields find the correct sampling regions". Isn't the sampling region fixed for a sampling center? From my opinion, the correctness loss will guide the generated feature to be like gt_image's feature in a semantic way. But I do not understand how will this loss change the perference of guassian sampling regions, which is determined by hyper-parameters, i.e., delta_h, delta_v, sigma.
By the way, in your paper, what does "The sampling process calculates the gradients according to the input pixels (features)" mean? I didn't get the formulation of gradient computation in sampling process.
Best regards.
Delta h and delta v are the offsets of a specific points. (Delta h, Delta v) is one point of the flow field. When (delta h, delta v) changes, the sampling region will be changed. Our sampling correctness loss helps the network find the correct sampling regions (obtain reasonable delta h delta v for each point )
The gradient of the sampling operation can be easily obtained according to the forward process. For example, the bilinear sampling uses 4 local pixels as inputs. Output=hx1+vx2+(1-h)x3+(1-v)x4 The gradient of h is zero when x1=x3. The gradient of v is zero when x2=x4. Unfortunately, pixel values of a local patch are always similar. Therefore we use Gaussian sampling to extend the receptive field of the sampling operation.
@RenYurui Got it! Thanks~
Dear researcher, I have a few questions about the appearance flow you mentioned in the paper. Could you please help me?
I understand equation (8) as a weighted sum of neighbor features, and the weight is correlated with the spatial distance to the sampling center. So, is there anything learnable parameter for this "weight map"? If not, how does this method achieve a attention-like influence in your visualization of the appearance flow?
In your implementation, the corrupted image structure is
input_corrupted_structure_image = mask * (smooth(gt_image))
. Isn't is should beinput_corrupted_structure_image = smooth(mask * gt_image)
?Thanks!