PeterL1n / BackgroundMattingV2

Real-Time High-Resolution Background Matting
MIT License
6.85k stars 952 forks source link

Question about the function "compute_pixel_indices" #50

Closed yasar-rehman closed 3 years ago

yasar-rehman commented 3 years ago

Hi Peter,

Thank you for sharing the source code which is well written and self-explanatory. However, could you explain the following lines, 278-279? I have pasted it below for your convenience. I know it would output the indices for the patch location corresponding to the original input x. However, could you explain the logic behind the following lines of code?

idx_pat = (c H W).view(C, 1, 1).expand([C, O, O]) + (o W).view(1, O, 1).expand([C, O, O]) + o.view(1, 1, O).expand([C, O, O]) idx_loc = b W H + y W S + x S idx = idx_loc.view(-1, 1, 1, 1).expand([n, C, O, O]) + idx_pat.view(1, C, O, O).expand([n, C, O, O])

Kind regards,

PeterL1n commented 3 years ago

Using a 1D array as an example. Imagine if the input is

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

and we want to split into 4-element patches. We only want the first and the third patch, so the result supposed to be:

[[0, 1,  2,  3],
 [8, 9, 10, 11]]

idx_pat is the index of a single patch. In this case, it would be

idx_pat = [0, 1, 2, 3]

idx_loc is the index of locations. In this case, it would be

idx_loc = [0, 8]

Therefore, to get the final indices:

idx = idx_loc + idx_pat

You need to use .view() and .expand to change the dimension. But this is the basic idea.

In the actual code, it is a lot complicated when you need to consider 2D inputs with multiple channels, and the patches can be overlapping. Getting the index is quite tricky. I spent a lot of time to figure it out. It involved a lot of drawing on paper as well as trials and errors.

yasar-rehman commented 3 years ago

Hi Peter,

Thank you for the response and detailed explanation. This clarifies many things.

Cheers!