Estimate foreground and background put into GPU

Hi, Thanks for your work! It's very nice!

I used your approach to estimate foreground and background. But I find the time it costs is big. So I want to put it to GPU by Cupy or pytorch. And then I want to know estimate_fb_ml()function is base on which paper and "linear equation system" is what, and do you have some idea ?

Thank you very much !

Hi, Thanks for your work! It's very nice!

Thank you! :D

And then I want to know estimate_fb_ml()function is base on which paper and "linear equation system" is what, and do you have some idea ?

Maybe the term system of linear equations is more common. The paper is currently being reviewed.

But I find the time it costs is big. So I want to put it to GPU by Cupy or pytorch.

The estimate_fb_cf()-method does not work great on GPU because conjugate gradient descent to solve the system of linear equations converges very slowly without preconditioner. The incomplete thresholded Cholesky preconditioner works best, but it can not be parallelized easily. The Jacobi preconditioner can be used instead, but it is not as good. See the following code. You can increase the regularization parameter or decrease rtol to get faster results but blurrier images.

from matting import load_image, save_image, stack_images
from matting.util import solve_cg, vec_vec_outer, pixel_coordinates
from matting.util import sparse_conv_matrix
import argparse
import numpy as np
import scipy.sparse
import cupy as cp
import cupyx.scipy.sparse

def estimate_fb_cf_cupy(
    image,
    alpha,
    regularization=1e-5,
    neighbors=[(-1, 0), (1, 0), (0, -1), (0, 1)],
    max_iter=10000,
    atol=0.0,
    rtol=1e-5,
    print_info=False,
):
    h, w = image.shape[:2]
    n = w * h

    a = alpha.flatten()

    # Build sparse linear equation system
    U = scipy.sparse.bmat([[
        scipy.sparse.diags(a),
        scipy.sparse.diags(1 - a)]])

    # Directional derivative matrices
    Ds = [
        sparse_conv_matrix(w, h, [0, dx], [0, dy], [1.0, -1.0])
        for dx, dy in neighbors]

    S = sum(
        D.T.dot(scipy.sparse.diags(regularization + np.abs(D.dot(a)))).dot(D)
        for D in Ds)

    V = scipy.sparse.bmat([
        [S, None],
        [None, S]])

    A = U.T.dot(U) + V

    inv_diag = 1.0 / A.diagonal()

    # upload matrix and inverse diagonal GPU
    A = cupyx.scipy.sparse.csr_matrix(A)
    inv_diag = cp.asarray(inv_diag)

    def precondition(x):
        return x * inv_diag

    foreground = np.zeros((h, w, 3))
    background = np.zeros((h, w, 3))

    # For each color channel
    for channel in range(3):
        if print_info:
            print("solving channel %d" % (1 + channel))

        image_channel = image[:, :, channel].flatten()

        b = U.T.dot(image_channel)

        b = cp.asarray(b)

        norm_b = cp.linalg.norm(b)

        x = cp.asarray(np.concatenate([image_channel]*2))

        r = b - A.dot(x)
        z = precondition(r)
        p = z.copy()
        rz = np.inner(r, z)
        for iteration in range(max_iter):
            Ap = A.dot(p)
            alpha = rz / cp.inner(p, Ap)
            x += alpha * p
            r -= alpha * Ap

            residual_error = cp.linalg.norm(r)

            if print_info:
                print("iteration %05d - residual error %e" % (
                    iteration,
                    residual_error))

            if residual_error < atol or residual_error < rtol * norm_b:
                break

            z = precondition(r)
            beta = 1.0 / rz
            rz = cp.inner(r, z)
            beta *= rz
            p *= beta
            p += z

        fb = cp.asnumpy(x)

        foreground[:, :, channel] = fb[:n].reshape(h, w)
        background[:, :, channel] = fb[n:].reshape(h, w)

    foreground = np.clip(foreground, 0, 1)
    background = np.clip(background, 0, 1)

    return foreground, background

def main():
    parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")

    parser.add_argument("input", help="Path of input image")
    parser.add_argument("alpha", help="Path of alpha image")
    parser.add_argument("output", help="Path of output image")

    args = parser.parse_args()

    image = load_image(args.input, "RGB")
    alpha = load_image(args.alpha, "GRAY")

    foreground, background = estimate_fb_cf_cupy(
        image,
        alpha,
        regularization=1e-3,
        rtol=1e-4,
        print_info=True)

    save_image(args.output, stack_images(foreground, alpha))

if __name__ == "__main__":
    main()

The estimate_fb_ml()-method could also be adapted to work on the GPU, but CuPy and PyTorch are not a great fit for that because most of the time would be spent copying around memory and doing very little calculations. There will be a fork of this library (possibly in a few weeks/months) which uses Numba instead of NumPy to accelerate the ml method which is much faster. You can try this code for now:

from matting import load_image, save_image, stack_images
from numba import njit
import numpy as np
import argparse

@njit("void(f4[:, :, :], f4[:, :, :])", cache=True)
def _resize_nearest_3d(dst, src):
    h_src, w_src, depth = src.shape
    h_dst, w_dst, depth = dst.shape

    for y_dst in range(h_dst):
        for x_dst in range(w_dst):
            x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
            y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))

            for c in range(depth):
                dst[y_dst, x_dst, c] = src[y_src, x_src, c]

@njit("void(f4[:, :], f4[:, :])", cache=True)
def _resize_nearest_2d(dst, src):
    h_src, w_src = src.shape
    h_dst, w_dst = dst.shape

    for y_dst in range(h_dst):
        for x_dst in range(w_dst):
            x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
            y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))

            dst[y_dst, x_dst] = src[y_src, x_src]

@njit("Tuple((f4[:, :, :], f4[:, :, :]))(f4[:, :, :], f4[:, :], f4, i4, i4, i4)", cache=True)
def _estimate_fb_ml(
    input_image,
    input_alpha,
    regularization,
    n_small_iterations,
    n_big_iterations,
    small_size,
):  
    h0, w0, depth = input_image.shape

    dtype = np.float32

    w_prev = 1
    h_prev = 1

    F_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
    B_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)

    n_levels = int(np.ceil(np.log2(max(w0, h0))))

    for i_level in range(n_levels + 1):
        w = round(w0 ** (i_level / n_levels))
        h = round(h0 ** (i_level / n_levels))

        image = np.empty((h, w, depth), dtype=dtype)
        alpha = np.empty((h, w), dtype=dtype)

        _resize_nearest_3d(image, input_image)
        _resize_nearest_2d(alpha, input_alpha)

        F = np.empty((h, w, depth), dtype=dtype)
        B = np.empty((h, w, depth), dtype=dtype)

        _resize_nearest_3d(F, F_prev)
        _resize_nearest_3d(B, B_prev)

        if w <= small_size and h <= small_size:
            n_iter = n_small_iterations
        else:
            n_iter = n_big_iterations

        b = np.zeros((2, depth), dtype=dtype)

        dx = [-1, 1, 0, 0]
        dy = [0, 0, -1, 1]

        for i_iter in range(n_iter):
            for y in range(h):
                for x in range(w):
                    a0 = alpha[y, x];
                    a1 = 1.0 - a0

                    a00 = a0 * a0
                    a01 = a0 * a1
                    # a10 = a01 due to symmetry, can be omitted
                    a11 = a1 * a1

                    for c in range(depth):
                        b[0, c] = a0 * image[y, x, c]
                        b[1, c] = a1 * image[y, x, c]

                    for d in range(4):
                        x2 = max(0, min(w - 1, x + dx[d]))
                        y2 = max(0, min(h - 1, y + dy[d]))

                        da = regularization + abs(a0 - alpha[y2, x2])

                        a00 += da
                        a11 += da

                        for c in range(depth):
                            b[0, c] += da * F[y2, x2, c]
                            b[1, c] += da * B[y2, x2, c]

                    determinant = a00 * a11 - a01 * a01

                    inv_det = 1.0 / determinant

                    b00 = inv_det * a11
                    b01 = inv_det * -a01
                    b11 = inv_det * a00

                    for c in range(depth):
                        F_c = b00 * b[0, c] + b01 * b[1, c];
                        B_c = b01 * b[0, c] + b11 * b[1, c];

                        F_c = max(0.0, min(1.0, F_c))
                        B_c = max(0.0, min(1.0, B_c))

                        F[y, x, c] = F_c
                        B[y, x, c] = B_c

        F_prev = F
        B_prev = B

        w_prev = w
        h_prev = h

    return F, B

def estimate_fb_ml_numba(
    image,
    alpha,
    regularization=1e-5,
    n_small_iterations=10,
    n_big_iterations=2,
    small_size=32,
):
    return _estimate_fb_ml(
        image.astype(np.float32),
        alpha.astype(np.float32),
        regularization,
        n_small_iterations,
        n_big_iterations,
        small_size)

def main():
    parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")

    parser.add_argument("input", help="Path of input image")
    parser.add_argument("alpha", help="Path of alpha image")
    parser.add_argument("output", help="Path of output image")

    args = parser.parse_args()

    image = load_image(args.input, "RGB")
    alpha = load_image(args.alpha, "GRAY")

    foreground, background = estimate_fb_ml_numba(image, alpha)

    save_image(args.output, stack_images(foreground, alpha))

if __name__ == "__main__":
    main()

Nice, good to know that CuPy is good enough to accelerate the ml method. Which GPU did you benchmark on?

And is this fast enough for you? I could also look into accelerating the ml method with handwritten CUDA (or OpenCL or even GL compute shaders), should be even faster.

thank you all for your awesome work. will be nice to see the OpenGL or GL compute shaders versions.

Thanks, I'll try to implement it.

There is something wrong in the estimate_fb_cf_cupy() when I run it

Do you remember what the error was?

asked "estimate_fb_ml()function is base on which paper and "linear equation system" is what" mean estimate_fb_ml()is base on which linear equation system

Basically, there are as many linear equation systems as there are pixels in the image, but each one is only a 2x2-matrix 'A' with three 2-element vectors 'b':

https://github.com/99991/matting/blob/6c2c258202e7a6d735329293d61698b0001c2caa/matting/foreground_background.py#L221

I want to put it into pytorch in the future !

I am not sure if pytorch would be the right place for that. They are probably only interesting in neural networks.

And nice to see your paper published successfully！

Hopefully soon, but the review process usually takes a few months.

I put it into StackOverflow AttributeError: module 'cupy' has no attribute 'cupyx'

Maybe there's something wrong in my cupy.I will change a docker to test it .

Thank you for investigating this error. Apparently I was supposed to import cupyx differently, but it worked anyway with the CuPy version I was using. I have changed the code above to use import cupyx.scipy.sparse and A = cupyx.scipy.sparse.csr_matrix(A) instead.

MemoryError

The system of linear equations for the closed form solution requires a lot of memory, but the matrix A does not actually have to be constructed. Instead, the various matrices which make up A could be applied to some input vector directly. But the ml method looks more promising for now, so I'll focus on that first.

The compute shader implementation turned out to be more work than expected, so here is an OpenCL implementation instead. It takes about one second to process a 5712x4000 image on a Quadro P100, so I'd expect it to take less than 2 seconds on a GTX 1080.

import argparse
import numpy as np
from matting import load_image, save_image, stack_images
import pyopencl as cl

source = """
__kernel void resize_nearest(
    __global float *dst,
    __global const float *src,
    int w_src, int h_src,
    int w_dst, int h_dst,
    int depth
){
    int x_dst = get_global_id(0);
    int y_dst = get_global_id(1);

    if (x_dst >= w_dst || y_dst >= h_dst) return;

    int x_src = min(x_dst * w_src / w_dst, w_src - 1);
    int y_src = min(y_dst * h_src / h_dst, h_src - 1);

    __global       float *ptr_dst = dst + (x_dst + y_dst * w_dst) * depth;
    __global const float *ptr_src = src + (x_src + y_src * w_src) * depth;

    for (int channel = 0; channel < depth; channel++){
        ptr_dst[channel] = ptr_src[channel];
    }
}

__kernel void ml_iteration(
    __global       float *F,
    __global       float *B,
    __global const float *F_prev,
    __global const float *B_prev,
    __global const float *image,
    __global const float *alpha,
    int w,
    int h,
    float regularization
){
    int x = get_global_id(0);
    int y = get_global_id(1);

    int i = x + y * w;

    if (x >= w || y >= h) return;

    float a0 = alpha[i];
    float a1 = 1.0f - a0;

    float b00 = a0 * image[i * 3 + 0];
    float b01 = a0 * image[i * 3 + 1];
    float b02 = a0 * image[i * 3 + 2];

    float b10 = a1 * image[i * 3 + 0];
    float b11 = a1 * image[i * 3 + 1];
    float b12 = a1 * image[i * 3 + 2];

    int js[4] = {
        max(    0, x - 1) + y * w,
        min(w - 1, x + 1) + y * w,
        x + max(    0, y - 1) * w,
        x + min(h - 1, y + 1) * w,
    };

    float a_sum = 0.0f;

    for (int d = 0; d < 4; d++){
        int j = js[d];

        float da = regularization + fabs(a0 - alpha[j]);

        a_sum += da;

        b00 += da * F_prev[j * 3 + 0];
        b01 += da * F_prev[j * 3 + 1];
        b02 += da * F_prev[j * 3 + 2];

        b10 += da * B_prev[j * 3 + 0];
        b11 += da * B_prev[j * 3 + 1];
        b12 += da * B_prev[j * 3 + 2];
    }

    float a00 = a0 * a0 + a_sum;
    float a11 = a1 * a1 + a_sum;
    float a01 = a0 * a1;

    float inv_det = 1.0f / (a00 * a11 - a01 * a01);

    F[i * 3 + 0] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b00 - a01 * b10)));
    F[i * 3 + 1] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b01 - a01 * b11)));
    F[i * 3 + 2] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b02 - a01 * b12)));
    B[i * 3 + 0] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b10 - a01 * b00)));
    B[i * 3 + 1] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b11 - a01 * b01)));
    B[i * 3 + 2] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b12 - a01 * b02)));
} 
"""

# TODO properly release those objects:
platform = cl.get_platforms()[0]
device = platform.get_devices(cl.device_type.GPU)[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)
program = cl.Program(context, source).build()

def estimate_fb_ml_cl(
    input_image,
    input_alpha,
    regularization=1e-5,
    n_small_iterations=10,
    n_big_iterations=2,
    small_size=32,
):
    def upload(array):
        hostbuf = array.astype(np.float32).flatten()
        return cl.Buffer(context, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=hostbuf)

    def alloc(*shape):
        n = np.product(shape)
        return cl.Buffer(context, cl.mem_flags.READ_WRITE, n * 4)

    def download(device_buf, shape):
        host_buf = np.empty(shape, dtype=np.float32)

        # TODO does this block?
        cl.enqueue_copy(queue, host_buf, device_buf)

        return host_buf.reshape(shape)

    h0, w0, depth = input_image.shape

    assert(depth == 3)

    n = h0 * w0 * depth

    input_image = upload(input_image)
    input_alpha = upload(input_alpha)

    F_prev = alloc(n)
    B_prev = alloc(n)

    F = alloc(n)
    B = alloc(n)

    image_level = alloc(n)
    alpha_level = alloc(h0 * w0)

    n_levels = (max(w0, h0) - 1).bit_length()

    def resize_nearest(dst, src, w_src, h_src, w_dst, h_dst, depth):
        program.resize_nearest(queue, (w_dst, h_dst), None, dst, src, *np.int32([w_src, h_src, w_dst, h_dst, depth]))

    w_prev = 1
    h_prev = 1

    resize_nearest(F, input_image, w0, h0, w_prev, h_prev, depth)
    resize_nearest(B, input_image, w0, h0, w_prev, h_prev, depth)

    for i_level in range(n_levels + 1):
        w = round(w0 ** (i_level / n_levels))
        h = round(h0 ** (i_level / n_levels))

        resize_nearest(image_level, input_image, w0, h0, w, h, depth)
        resize_nearest(alpha_level, input_alpha, w0, h0, w, h, 1)

        resize_nearest(F_prev, F, w_prev, h_prev, w, h, depth)
        resize_nearest(B_prev, B, w_prev, h_prev, w, h, depth)

        # Do more iterations for low resolution.
        n_iter = n_big_iterations
        if min(w, h) <= small_size:
            n_iter = n_small_iterations

        for i_iter in range(n_iter):
            program.ml_iteration(
                queue,
                (w, h),
                None,
                F,
                B,
                F_prev,
                B_prev,
                image_level,
                alpha_level,
                np.int32(w),
                np.int32(h),
                np.float32(regularization))

            F_prev, F = F, F_prev
            B_prev, B = B, B_prev

        w_prev = w
        h_prev = h

    F_host = download(F, (h0, w0, depth))
    B_host = download(B, (h0, w0, depth))

    for buf in [
        F, B,
        F_prev, B_prev,
        input_image, input_alpha,
        image_level, alpha_level
    ]:
        buf.release()

    return F_host, B_host

def main():
    parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")

    parser.add_argument("input", help="Path of input image")
    parser.add_argument("alpha", help="Path of alpha image")
    parser.add_argument("output", help="Path of output image")

    args = parser.parse_args()

    image = load_image(args.input, "RGB")
    alpha = load_image(args.alpha, "GRAY")

    foreground, background = estimate_fb_ml_cl(image, alpha)

    save_image(args.output, stack_images(foreground, alpha))

if __name__ == "__main__":
    main()

EDIT: CUDA implementation runs 40% faster (0.6 seconds instead of 1.0 seconds) although the code is mostly the same as the OpenCL implementation.

import argparse
import numpy as np
from matting import load_image, save_image, stack_images
import cupy as cp

_resize_nearest = cp.RawKernel(r"""
extern "C" __global__
void resize_nearest(
    float *dst,
    const float *src,
    int w_src, int h_src,
    int w_dst, int h_dst,
    int depth
){
    int x_dst = blockDim.x * blockIdx.x + threadIdx.x;
    int y_dst = blockDim.y * blockIdx.y + threadIdx.y;

    if (x_dst >= w_dst || y_dst >= h_dst) return;

    int x_src = min(x_dst * w_src / w_dst, w_src - 1);
    int y_src = min(y_dst * h_src / h_dst, h_src - 1);

          float *ptr_dst = dst + (x_dst + y_dst * w_dst) * depth;
    const float *ptr_src = src + (x_src + y_src * w_src) * depth;

    for (int channel = 0; channel < depth; channel++){
        ptr_dst[channel] = ptr_src[channel];
    }
}
""", "resize_nearest")

ml_iteration = cp.RawKernel(r"""
extern "C" __global__
void ml_iteration(
          float *F,
          float *B,
    const float *F_prev,
    const float *B_prev,
    const float *image,
    const float *alpha,
    int w,
    int h,
    float regularization
){
    int x = blockDim.x * blockIdx.x + threadIdx.x;
    int y = blockDim.y * blockIdx.y + threadIdx.y;

    int i = x + y * w;

    if (x >= w || y >= h) return;

    float a0 = alpha[i];
    float a1 = 1.0f - a0;

    float b00 = a0 * image[i * 3 + 0];
    float b01 = a0 * image[i * 3 + 1];
    float b02 = a0 * image[i * 3 + 2];

    float b10 = a1 * image[i * 3 + 0];
    float b11 = a1 * image[i * 3 + 1];
    float b12 = a1 * image[i * 3 + 2];

    int js[4] = {
        max(    0, x - 1) + y * w,
        min(w - 1, x + 1) + y * w,
        x + max(    0, y - 1) * w,
        x + min(h - 1, y + 1) * w,
    };

    float a_sum = 0.0f;

    for (int d = 0; d < 4; d++){
        int j = js[d];

        float da = regularization + fabsf(a0 - alpha[j]);

        a_sum += da;

        b00 += da * F_prev[j * 3 + 0];
        b01 += da * F_prev[j * 3 + 1];
        b02 += da * F_prev[j * 3 + 2];

        b10 += da * B_prev[j * 3 + 0];
        b11 += da * B_prev[j * 3 + 1];
        b12 += da * B_prev[j * 3 + 2];
    }

    float a00 = a0 * a0 + a_sum;
    float a11 = a1 * a1 + a_sum;
    float a01 = a0 * a1;

    float inv_det = 1.0f / (a00 * a11 - a01 * a01);

    F[i * 3 + 0] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b00 - a01 * b10)));
    F[i * 3 + 1] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b01 - a01 * b11)));
    F[i * 3 + 2] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b02 - a01 * b12)));

    B[i * 3 + 0] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b10 - a01 * b00)));
    B[i * 3 + 1] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b11 - a01 * b01)));
    B[i * 3 + 2] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b12 - a01 * b02)));
} 
""", "ml_iteration")

def estimate_fb_ml_cl(
    input_image,
    input_alpha,
    regularization=1e-5,
    n_small_iterations=10,
    n_big_iterations=2,
    small_size=32,
    block_size=(32, 32),
):
    h0, w0, depth = input_image.shape

    assert(depth == 3)

    n = h0 * w0 * depth

    input_image = cp.asarray(input_image.astype(np.float32).flatten())
    input_alpha = cp.asarray(input_alpha.astype(np.float32).flatten())

    F_prev = cp.zeros(n, dtype=cp.float32)
    B_prev = cp.zeros(n, dtype=cp.float32)

    F = cp.zeros(n, dtype=cp.float32)
    B = cp.zeros(n, dtype=cp.float32)

    image_level = cp.zeros(n, dtype=cp.float32)
    alpha_level = cp.zeros(h0 * w0, dtype=cp.float32)

    n_levels = (max(w0, h0) - 1).bit_length()

    def div_round_up(x, n):
        return (x + n - 1) // n

    def resize_nearest(dst, src, w_src, h_src, w_dst, h_dst, depth):
        grid_size = (
            div_round_up(w_dst, block_size[0]),
            div_round_up(h_dst, block_size[1]))

        _resize_nearest(grid_size, block_size, (
            dst, src,
            w_src, h_src,
            w_dst, h_dst,
            depth,
        ))

    w_prev = 1
    h_prev = 1

    resize_nearest(F, input_image, w0, h0, w_prev, h_prev, depth)
    resize_nearest(B, input_image, w0, h0, w_prev, h_prev, depth)

    for i_level in range(n_levels + 1):
        w = round(w0 ** (i_level / n_levels))
        h = round(h0 ** (i_level / n_levels))

        resize_nearest(image_level, input_image, w0, h0, w, h, depth)
        resize_nearest(alpha_level, input_alpha, w0, h0, w, h, 1)

        resize_nearest(F_prev, F, w_prev, h_prev, w, h, depth)
        resize_nearest(B_prev, B, w_prev, h_prev, w, h, depth)

        # Do more iterations for low resolution.
        n_iter = n_big_iterations
        if min(w, h) <= small_size:
            n_iter = n_small_iterations

        grid_size = (
            div_round_up(w, block_size[0]),
            div_round_up(h, block_size[1]))

        for i_iter in range(n_iter):
            ml_iteration(grid_size, block_size, (
                F,
                B,
                F_prev,
                B_prev,
                image_level,
                alpha_level,
                w, h,
                np.float32(regularization),
            ))

            F_prev, F = F, F_prev
            B_prev, B = B, B_prev

        w_prev = w
        h_prev = h

    F_host = cp.asnumpy(F).reshape(h0, w0, depth)
    B_host = cp.asnumpy(B).reshape(h0, w0, depth)

    return F_host, B_host

def main():
    parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")

    parser.add_argument("input", help="Path of input image")
    parser.add_argument("alpha", help="Path of alpha image")
    parser.add_argument("output", help="Path of output image")

    args = parser.parse_args()

    image = load_image(args.input, "RGB")
    alpha = load_image(args.alpha, "GRAY")

    foreground, background = estimate_fb_ml_cl(image, alpha)

    save_image("foreground.png", foreground)
    save_image("background.png", background)
    save_image(args.output, stack_images(foreground, alpha))

if __name__ == "__main__":
    main()

cool! The work you did is perfect ! thanks for your work ! I run the pyopencl code yesterday ,an error thrown out :

Traceback (most recent call last):
  File "opencl_estimate.py", line 152, in <module>
    platform = cl.get_platforms()[0]
pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

I checked it ,found the reason is No ICDs were found.I solved it !

Here is the result I tested :

Again, great work! I can obtain some decent results most of the time now. Occasionally, I ran into the problem with rough edges (zigzags) when the image resolution is low and the contrast is weak between the foreground and background. Do you have a good solution to propose? I am thinking of using some sort of anti-aliasing algorithm with Shader, but I have no clue as to where to start and how to do it in python. Any suggestions?

It might be sufficient to simply increase the number of iterations (n_small_iterations, n_big_iterations) and/or small_size. The default number of iterations were sufficient for my test images. Can you share your images?

If increasing iterations does not help, a more elaborate upsampling method can be used. For example, one could use bilinear upsampling instead of nearest neighbor upsampling in the resize_nearest function. Maybe I'll find a few minutes to implement that next week.

One of the authors of remove.bg says they are using multiple u-nets, i.e. a form of neural network.

But I don't think that's all they do because the thickness of the alpha boundary is the same everywhere.

alpha

Maybe their neural networks are just generating a binary segmentation which they then follow up by an erosion and a blur operation?

99991 / matting

Estimate foreground and background put into GPU #4