MarcoForte / FBA_Matting

Official repository for the paper F, B, Alpha Matting
MIT License
467 stars 95 forks source link

Distance Transform #17

Open bluesky314 opened 4 years ago

bluesky314 commented 4 years ago

Hey, I do not understand how distance map is being used here and what the clicks variable is exactly supposed to represent:

def dt(a):
    return cv2.distanceTransform((a * 255).astype(np.uint8), cv2.DIST_L2, 0)

def trimap_transform(trimap):
    h, w = trimap.shape[0], trimap.shape[1]

    clicks = np.zeros((h, w, 6))
    for k in range(2):
        if(np.count_nonzero(trimap[:, :, k]) > 0):
            dt_mask = -dt(1 - trimap[:, :, k])**2
            L = 320
            clicks[:, :, 3*k] = np.exp(dt_mask / (2 * ((0.02 * L)**2)))
            clicks[:, :, 3*k+1] = np.exp(dt_mask / (2 * ((0.08 * L)**2)))
            clicks[:, :, 3*k+2] = np.exp(dt_mask / (2 * ((0.16 * L)**2)))

    return clicks

Can you please explain this?

MarcoForte commented 4 years ago

Instead of only feeding the network the binary trimap we also feed the distance transformed version. The distance from the definite foreground and background regions is a strong indicator of what the alpha could be.

The clicks variable represents the transformed trimap. The variable name is not the most accurate and will eventually be fixed.

bluesky314 commented 4 years ago

Ok but it does not seem like a simple distance transform. What do 3k,3k+1,3k+2 and 2 ((0.02 * L)2), 2 ((0.08 L)2) ... mean in the for loop? What is the whole loop doing? And why is clicks of dimension 6?

99991 commented 4 years ago

The distance transform is used to compute an approximate alpha matte based on the trimap.

The first function which is used here goes to 0 at approximately a distance for 25 pixels, the second function goes to 0 for a distance of 100 pixels and the third function for 200 pixels.

Here is a plot of the distance of the distance vs the approximate alpha matte value for the first function:

https://www.wolframalpha.com/input/?i=plot+e%5E%28-%28%281-x%29%5E2%29+%2F+%282+*+%28%280.02+*+320%29%5E2%29%29%29

clicks has 6 channels because the three distances are computed to both the fixed foreground and background of the trimap.

bluesky314 commented 4 years ago

Thanks @99991 and @MarcoForte , but I dont see how distance transform makes sense as the images are all on different scales. Distance in pixel space which is used by distance transform may not mean much when the image is a close-up of a person's face because all points are close by the object of interest.

bluesky314 commented 4 years ago

@99991 , @MarcoForte Can either of you clarify if I am getting something wrong about the appropriateness of distance maps?

bluesky314 commented 3 years ago

@99991 , @MarcoForte Did you guys get to think about the above point that distance transform not taking scale of image into account?