Closed onefish51 closed 4 years ago
Hi, Thanks for your work! It's very nice!
Thank you! :D
And then I want to know estimate_fb_ml()function is base on which paper and "linear equation system" is what, and do you have some idea ?
Maybe the term system of linear equations is more common. The paper is currently being reviewed.
But I find the time it costs is big. So I want to put it to GPU by Cupy or pytorch.
The estimate_fb_cf()
-method does not work great on GPU because conjugate gradient descent to solve the system of linear equations converges very slowly without preconditioner. The incomplete thresholded Cholesky preconditioner works best, but it can not be parallelized easily. The Jacobi preconditioner can be used instead, but it is not as good. See the following code.
You can increase the regularization
parameter or decrease rtol
to get faster results but blurrier images.
from matting import load_image, save_image, stack_images
from matting.util import solve_cg, vec_vec_outer, pixel_coordinates
from matting.util import sparse_conv_matrix
import argparse
import numpy as np
import scipy.sparse
import cupy as cp
import cupyx.scipy.sparse
def estimate_fb_cf_cupy(
image,
alpha,
regularization=1e-5,
neighbors=[(-1, 0), (1, 0), (0, -1), (0, 1)],
max_iter=10000,
atol=0.0,
rtol=1e-5,
print_info=False,
):
h, w = image.shape[:2]
n = w * h
a = alpha.flatten()
# Build sparse linear equation system
U = scipy.sparse.bmat([[
scipy.sparse.diags(a),
scipy.sparse.diags(1 - a)]])
# Directional derivative matrices
Ds = [
sparse_conv_matrix(w, h, [0, dx], [0, dy], [1.0, -1.0])
for dx, dy in neighbors]
S = sum(
D.T.dot(scipy.sparse.diags(regularization + np.abs(D.dot(a)))).dot(D)
for D in Ds)
V = scipy.sparse.bmat([
[S, None],
[None, S]])
A = U.T.dot(U) + V
inv_diag = 1.0 / A.diagonal()
# upload matrix and inverse diagonal GPU
A = cupyx.scipy.sparse.csr_matrix(A)
inv_diag = cp.asarray(inv_diag)
def precondition(x):
return x * inv_diag
foreground = np.zeros((h, w, 3))
background = np.zeros((h, w, 3))
# For each color channel
for channel in range(3):
if print_info:
print("solving channel %d" % (1 + channel))
image_channel = image[:, :, channel].flatten()
b = U.T.dot(image_channel)
b = cp.asarray(b)
norm_b = cp.linalg.norm(b)
x = cp.asarray(np.concatenate([image_channel]*2))
r = b - A.dot(x)
z = precondition(r)
p = z.copy()
rz = np.inner(r, z)
for iteration in range(max_iter):
Ap = A.dot(p)
alpha = rz / cp.inner(p, Ap)
x += alpha * p
r -= alpha * Ap
residual_error = cp.linalg.norm(r)
if print_info:
print("iteration %05d - residual error %e" % (
iteration,
residual_error))
if residual_error < atol or residual_error < rtol * norm_b:
break
z = precondition(r)
beta = 1.0 / rz
rz = cp.inner(r, z)
beta *= rz
p *= beta
p += z
fb = cp.asnumpy(x)
foreground[:, :, channel] = fb[:n].reshape(h, w)
background[:, :, channel] = fb[n:].reshape(h, w)
foreground = np.clip(foreground, 0, 1)
background = np.clip(background, 0, 1)
return foreground, background
def main():
parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")
parser.add_argument("input", help="Path of input image")
parser.add_argument("alpha", help="Path of alpha image")
parser.add_argument("output", help="Path of output image")
args = parser.parse_args()
image = load_image(args.input, "RGB")
alpha = load_image(args.alpha, "GRAY")
foreground, background = estimate_fb_cf_cupy(
image,
alpha,
regularization=1e-3,
rtol=1e-4,
print_info=True)
save_image(args.output, stack_images(foreground, alpha))
if __name__ == "__main__":
main()
The estimate_fb_ml()
-method could also be adapted to work on the GPU, but CuPy
and PyTorch
are not a great fit for that because most of the time would be spent copying around memory and doing very little calculations. There will be a fork of this library (possibly in a few weeks/months) which uses Numba
instead of NumPy
to accelerate the ml
method which is much faster. You can try this code for now:
from matting import load_image, save_image, stack_images
from numba import njit
import numpy as np
import argparse
@njit("void(f4[:, :, :], f4[:, :, :])", cache=True)
def _resize_nearest_3d(dst, src):
h_src, w_src, depth = src.shape
h_dst, w_dst, depth = dst.shape
for y_dst in range(h_dst):
for x_dst in range(w_dst):
x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))
for c in range(depth):
dst[y_dst, x_dst, c] = src[y_src, x_src, c]
@njit("void(f4[:, :], f4[:, :])", cache=True)
def _resize_nearest_2d(dst, src):
h_src, w_src = src.shape
h_dst, w_dst = dst.shape
for y_dst in range(h_dst):
for x_dst in range(w_dst):
x_src = max(0, min(w_src - 1, x_dst * w_src // w_dst))
y_src = max(0, min(h_src - 1, y_dst * h_src // h_dst))
dst[y_dst, x_dst] = src[y_src, x_src]
@njit("Tuple((f4[:, :, :], f4[:, :, :]))(f4[:, :, :], f4[:, :], f4, i4, i4, i4)", cache=True)
def _estimate_fb_ml(
input_image,
input_alpha,
regularization,
n_small_iterations,
n_big_iterations,
small_size,
):
h0, w0, depth = input_image.shape
dtype = np.float32
w_prev = 1
h_prev = 1
F_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
B_prev = np.empty((h_prev, w_prev, depth), dtype=dtype)
n_levels = int(np.ceil(np.log2(max(w0, h0))))
for i_level in range(n_levels + 1):
w = round(w0 ** (i_level / n_levels))
h = round(h0 ** (i_level / n_levels))
image = np.empty((h, w, depth), dtype=dtype)
alpha = np.empty((h, w), dtype=dtype)
_resize_nearest_3d(image, input_image)
_resize_nearest_2d(alpha, input_alpha)
F = np.empty((h, w, depth), dtype=dtype)
B = np.empty((h, w, depth), dtype=dtype)
_resize_nearest_3d(F, F_prev)
_resize_nearest_3d(B, B_prev)
if w <= small_size and h <= small_size:
n_iter = n_small_iterations
else:
n_iter = n_big_iterations
b = np.zeros((2, depth), dtype=dtype)
dx = [-1, 1, 0, 0]
dy = [0, 0, -1, 1]
for i_iter in range(n_iter):
for y in range(h):
for x in range(w):
a0 = alpha[y, x];
a1 = 1.0 - a0
a00 = a0 * a0
a01 = a0 * a1
# a10 = a01 due to symmetry, can be omitted
a11 = a1 * a1
for c in range(depth):
b[0, c] = a0 * image[y, x, c]
b[1, c] = a1 * image[y, x, c]
for d in range(4):
x2 = max(0, min(w - 1, x + dx[d]))
y2 = max(0, min(h - 1, y + dy[d]))
da = regularization + abs(a0 - alpha[y2, x2])
a00 += da
a11 += da
for c in range(depth):
b[0, c] += da * F[y2, x2, c]
b[1, c] += da * B[y2, x2, c]
determinant = a00 * a11 - a01 * a01
inv_det = 1.0 / determinant
b00 = inv_det * a11
b01 = inv_det * -a01
b11 = inv_det * a00
for c in range(depth):
F_c = b00 * b[0, c] + b01 * b[1, c];
B_c = b01 * b[0, c] + b11 * b[1, c];
F_c = max(0.0, min(1.0, F_c))
B_c = max(0.0, min(1.0, B_c))
F[y, x, c] = F_c
B[y, x, c] = B_c
F_prev = F
B_prev = B
w_prev = w
h_prev = h
return F, B
def estimate_fb_ml_numba(
image,
alpha,
regularization=1e-5,
n_small_iterations=10,
n_big_iterations=2,
small_size=32,
):
return _estimate_fb_ml(
image.astype(np.float32),
alpha.astype(np.float32),
regularization,
n_small_iterations,
n_big_iterations,
small_size)
def main():
parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")
parser.add_argument("input", help="Path of input image")
parser.add_argument("alpha", help="Path of alpha image")
parser.add_argument("output", help="Path of output image")
args = parser.parse_args()
image = load_image(args.input, "RGB")
alpha = load_image(args.alpha, "GRAY")
foreground, background = estimate_fb_ml_numba(image, alpha)
save_image(args.output, stack_images(foreground, alpha))
if __name__ == "__main__":
main()
Nice, good to know that CuPy is good enough to accelerate the ml method. Which GPU did you benchmark on?
And is this fast enough for you? I could also look into accelerating the ml method with handwritten CUDA (or OpenCL or even GL compute shaders), should be even faster.
thank you all for your awesome work. will be nice to see the OpenGL or GL compute shaders versions.
Thanks, I'll try to implement it.
There is something wrong in the estimate_fb_cf_cupy() when I run it
Do you remember what the error was?
asked "estimate_fb_ml()function is base on which paper and "linear equation system" is what" mean estimate_fb_ml()is base on which linear equation system
Basically, there are as many linear equation systems as there are pixels in the image, but each one is only a 2x2-matrix 'A' with three 2-element vectors 'b':
I want to put it into pytorch in the future !
I am not sure if pytorch would be the right place for that. They are probably only interesting in neural networks.
And nice to see your paper published successfully!
Hopefully soon, but the review process usually takes a few months.
I put it into StackOverflow AttributeError: module 'cupy' has no attribute 'cupyx'
Maybe there's something wrong in my cupy.I will change a docker to test it .
Thank you for investigating this error. Apparently I was supposed to import cupyx differently, but it worked anyway with the CuPy version I was using. I have changed the code above to use import cupyx.scipy.sparse
and A = cupyx.scipy.sparse.csr_matrix(A)
instead.
MemoryError
The system of linear equations for the closed form solution requires a lot of memory, but the matrix A
does not actually have to be constructed. Instead, the various matrices which make up A
could be applied to some input vector directly. But the ml
method looks more promising for now, so I'll focus on that first.
The compute shader implementation turned out to be more work than expected, so here is an OpenCL implementation instead. It takes about one second to process a 5712x4000 image on a Quadro P100, so I'd expect it to take less than 2 seconds on a GTX 1080.
import argparse
import numpy as np
from matting import load_image, save_image, stack_images
import pyopencl as cl
source = """
__kernel void resize_nearest(
__global float *dst,
__global const float *src,
int w_src, int h_src,
int w_dst, int h_dst,
int depth
){
int x_dst = get_global_id(0);
int y_dst = get_global_id(1);
if (x_dst >= w_dst || y_dst >= h_dst) return;
int x_src = min(x_dst * w_src / w_dst, w_src - 1);
int y_src = min(y_dst * h_src / h_dst, h_src - 1);
__global float *ptr_dst = dst + (x_dst + y_dst * w_dst) * depth;
__global const float *ptr_src = src + (x_src + y_src * w_src) * depth;
for (int channel = 0; channel < depth; channel++){
ptr_dst[channel] = ptr_src[channel];
}
}
__kernel void ml_iteration(
__global float *F,
__global float *B,
__global const float *F_prev,
__global const float *B_prev,
__global const float *image,
__global const float *alpha,
int w,
int h,
float regularization
){
int x = get_global_id(0);
int y = get_global_id(1);
int i = x + y * w;
if (x >= w || y >= h) return;
float a0 = alpha[i];
float a1 = 1.0f - a0;
float b00 = a0 * image[i * 3 + 0];
float b01 = a0 * image[i * 3 + 1];
float b02 = a0 * image[i * 3 + 2];
float b10 = a1 * image[i * 3 + 0];
float b11 = a1 * image[i * 3 + 1];
float b12 = a1 * image[i * 3 + 2];
int js[4] = {
max( 0, x - 1) + y * w,
min(w - 1, x + 1) + y * w,
x + max( 0, y - 1) * w,
x + min(h - 1, y + 1) * w,
};
float a_sum = 0.0f;
for (int d = 0; d < 4; d++){
int j = js[d];
float da = regularization + fabs(a0 - alpha[j]);
a_sum += da;
b00 += da * F_prev[j * 3 + 0];
b01 += da * F_prev[j * 3 + 1];
b02 += da * F_prev[j * 3 + 2];
b10 += da * B_prev[j * 3 + 0];
b11 += da * B_prev[j * 3 + 1];
b12 += da * B_prev[j * 3 + 2];
}
float a00 = a0 * a0 + a_sum;
float a11 = a1 * a1 + a_sum;
float a01 = a0 * a1;
float inv_det = 1.0f / (a00 * a11 - a01 * a01);
F[i * 3 + 0] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b00 - a01 * b10)));
F[i * 3 + 1] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b01 - a01 * b11)));
F[i * 3 + 2] = fmax(0.0f, fmin(1.0f, inv_det * (a11 * b02 - a01 * b12)));
B[i * 3 + 0] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b10 - a01 * b00)));
B[i * 3 + 1] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b11 - a01 * b01)));
B[i * 3 + 2] = fmax(0.0f, fmin(1.0f, inv_det * (a00 * b12 - a01 * b02)));
}
"""
# TODO properly release those objects:
platform = cl.get_platforms()[0]
device = platform.get_devices(cl.device_type.GPU)[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)
program = cl.Program(context, source).build()
def estimate_fb_ml_cl(
input_image,
input_alpha,
regularization=1e-5,
n_small_iterations=10,
n_big_iterations=2,
small_size=32,
):
def upload(array):
hostbuf = array.astype(np.float32).flatten()
return cl.Buffer(context, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=hostbuf)
def alloc(*shape):
n = np.product(shape)
return cl.Buffer(context, cl.mem_flags.READ_WRITE, n * 4)
def download(device_buf, shape):
host_buf = np.empty(shape, dtype=np.float32)
# TODO does this block?
cl.enqueue_copy(queue, host_buf, device_buf)
return host_buf.reshape(shape)
h0, w0, depth = input_image.shape
assert(depth == 3)
n = h0 * w0 * depth
input_image = upload(input_image)
input_alpha = upload(input_alpha)
F_prev = alloc(n)
B_prev = alloc(n)
F = alloc(n)
B = alloc(n)
image_level = alloc(n)
alpha_level = alloc(h0 * w0)
n_levels = (max(w0, h0) - 1).bit_length()
def resize_nearest(dst, src, w_src, h_src, w_dst, h_dst, depth):
program.resize_nearest(queue, (w_dst, h_dst), None, dst, src, *np.int32([w_src, h_src, w_dst, h_dst, depth]))
w_prev = 1
h_prev = 1
resize_nearest(F, input_image, w0, h0, w_prev, h_prev, depth)
resize_nearest(B, input_image, w0, h0, w_prev, h_prev, depth)
for i_level in range(n_levels + 1):
w = round(w0 ** (i_level / n_levels))
h = round(h0 ** (i_level / n_levels))
resize_nearest(image_level, input_image, w0, h0, w, h, depth)
resize_nearest(alpha_level, input_alpha, w0, h0, w, h, 1)
resize_nearest(F_prev, F, w_prev, h_prev, w, h, depth)
resize_nearest(B_prev, B, w_prev, h_prev, w, h, depth)
# Do more iterations for low resolution.
n_iter = n_big_iterations
if min(w, h) <= small_size:
n_iter = n_small_iterations
for i_iter in range(n_iter):
program.ml_iteration(
queue,
(w, h),
None,
F,
B,
F_prev,
B_prev,
image_level,
alpha_level,
np.int32(w),
np.int32(h),
np.float32(regularization))
F_prev, F = F, F_prev
B_prev, B = B, B_prev
w_prev = w
h_prev = h
F_host = download(F, (h0, w0, depth))
B_host = download(B, (h0, w0, depth))
for buf in [
F, B,
F_prev, B_prev,
input_image, input_alpha,
image_level, alpha_level
]:
buf.release()
return F_host, B_host
def main():
parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")
parser.add_argument("input", help="Path of input image")
parser.add_argument("alpha", help="Path of alpha image")
parser.add_argument("output", help="Path of output image")
args = parser.parse_args()
image = load_image(args.input, "RGB")
alpha = load_image(args.alpha, "GRAY")
foreground, background = estimate_fb_ml_cl(image, alpha)
save_image(args.output, stack_images(foreground, alpha))
if __name__ == "__main__":
main()
EDIT: CUDA implementation runs 40% faster (0.6 seconds instead of 1.0 seconds) although the code is mostly the same as the OpenCL implementation.
import argparse
import numpy as np
from matting import load_image, save_image, stack_images
import cupy as cp
_resize_nearest = cp.RawKernel(r"""
extern "C" __global__
void resize_nearest(
float *dst,
const float *src,
int w_src, int h_src,
int w_dst, int h_dst,
int depth
){
int x_dst = blockDim.x * blockIdx.x + threadIdx.x;
int y_dst = blockDim.y * blockIdx.y + threadIdx.y;
if (x_dst >= w_dst || y_dst >= h_dst) return;
int x_src = min(x_dst * w_src / w_dst, w_src - 1);
int y_src = min(y_dst * h_src / h_dst, h_src - 1);
float *ptr_dst = dst + (x_dst + y_dst * w_dst) * depth;
const float *ptr_src = src + (x_src + y_src * w_src) * depth;
for (int channel = 0; channel < depth; channel++){
ptr_dst[channel] = ptr_src[channel];
}
}
""", "resize_nearest")
ml_iteration = cp.RawKernel(r"""
extern "C" __global__
void ml_iteration(
float *F,
float *B,
const float *F_prev,
const float *B_prev,
const float *image,
const float *alpha,
int w,
int h,
float regularization
){
int x = blockDim.x * blockIdx.x + threadIdx.x;
int y = blockDim.y * blockIdx.y + threadIdx.y;
int i = x + y * w;
if (x >= w || y >= h) return;
float a0 = alpha[i];
float a1 = 1.0f - a0;
float b00 = a0 * image[i * 3 + 0];
float b01 = a0 * image[i * 3 + 1];
float b02 = a0 * image[i * 3 + 2];
float b10 = a1 * image[i * 3 + 0];
float b11 = a1 * image[i * 3 + 1];
float b12 = a1 * image[i * 3 + 2];
int js[4] = {
max( 0, x - 1) + y * w,
min(w - 1, x + 1) + y * w,
x + max( 0, y - 1) * w,
x + min(h - 1, y + 1) * w,
};
float a_sum = 0.0f;
for (int d = 0; d < 4; d++){
int j = js[d];
float da = regularization + fabsf(a0 - alpha[j]);
a_sum += da;
b00 += da * F_prev[j * 3 + 0];
b01 += da * F_prev[j * 3 + 1];
b02 += da * F_prev[j * 3 + 2];
b10 += da * B_prev[j * 3 + 0];
b11 += da * B_prev[j * 3 + 1];
b12 += da * B_prev[j * 3 + 2];
}
float a00 = a0 * a0 + a_sum;
float a11 = a1 * a1 + a_sum;
float a01 = a0 * a1;
float inv_det = 1.0f / (a00 * a11 - a01 * a01);
F[i * 3 + 0] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b00 - a01 * b10)));
F[i * 3 + 1] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b01 - a01 * b11)));
F[i * 3 + 2] = fmaxf(0.0f, fminf(1.0f, inv_det * (a11 * b02 - a01 * b12)));
B[i * 3 + 0] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b10 - a01 * b00)));
B[i * 3 + 1] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b11 - a01 * b01)));
B[i * 3 + 2] = fmaxf(0.0f, fminf(1.0f, inv_det * (a00 * b12 - a01 * b02)));
}
""", "ml_iteration")
def estimate_fb_ml_cl(
input_image,
input_alpha,
regularization=1e-5,
n_small_iterations=10,
n_big_iterations=2,
small_size=32,
block_size=(32, 32),
):
h0, w0, depth = input_image.shape
assert(depth == 3)
n = h0 * w0 * depth
input_image = cp.asarray(input_image.astype(np.float32).flatten())
input_alpha = cp.asarray(input_alpha.astype(np.float32).flatten())
F_prev = cp.zeros(n, dtype=cp.float32)
B_prev = cp.zeros(n, dtype=cp.float32)
F = cp.zeros(n, dtype=cp.float32)
B = cp.zeros(n, dtype=cp.float32)
image_level = cp.zeros(n, dtype=cp.float32)
alpha_level = cp.zeros(h0 * w0, dtype=cp.float32)
n_levels = (max(w0, h0) - 1).bit_length()
def div_round_up(x, n):
return (x + n - 1) // n
def resize_nearest(dst, src, w_src, h_src, w_dst, h_dst, depth):
grid_size = (
div_round_up(w_dst, block_size[0]),
div_round_up(h_dst, block_size[1]))
_resize_nearest(grid_size, block_size, (
dst, src,
w_src, h_src,
w_dst, h_dst,
depth,
))
w_prev = 1
h_prev = 1
resize_nearest(F, input_image, w0, h0, w_prev, h_prev, depth)
resize_nearest(B, input_image, w0, h0, w_prev, h_prev, depth)
for i_level in range(n_levels + 1):
w = round(w0 ** (i_level / n_levels))
h = round(h0 ** (i_level / n_levels))
resize_nearest(image_level, input_image, w0, h0, w, h, depth)
resize_nearest(alpha_level, input_alpha, w0, h0, w, h, 1)
resize_nearest(F_prev, F, w_prev, h_prev, w, h, depth)
resize_nearest(B_prev, B, w_prev, h_prev, w, h, depth)
# Do more iterations for low resolution.
n_iter = n_big_iterations
if min(w, h) <= small_size:
n_iter = n_small_iterations
grid_size = (
div_round_up(w, block_size[0]),
div_round_up(h, block_size[1]))
for i_iter in range(n_iter):
ml_iteration(grid_size, block_size, (
F,
B,
F_prev,
B_prev,
image_level,
alpha_level,
w, h,
np.float32(regularization),
))
F_prev, F = F, F_prev
B_prev, B = B, B_prev
w_prev = w
h_prev = h
F_host = cp.asnumpy(F).reshape(h0, w0, depth)
B_host = cp.asnumpy(B).reshape(h0, w0, depth)
return F_host, B_host
def main():
parser = argparse.ArgumentParser(description="Specify input image, alpha and output path.")
parser.add_argument("input", help="Path of input image")
parser.add_argument("alpha", help="Path of alpha image")
parser.add_argument("output", help="Path of output image")
args = parser.parse_args()
image = load_image(args.input, "RGB")
alpha = load_image(args.alpha, "GRAY")
foreground, background = estimate_fb_ml_cl(image, alpha)
save_image("foreground.png", foreground)
save_image("background.png", background)
save_image(args.output, stack_images(foreground, alpha))
if __name__ == "__main__":
main()
cool! The work you did is perfect ! thanks for your work ! I run the pyopencl code yesterday ,an error thrown out :
Traceback (most recent call last):
File "opencl_estimate.py", line 152, in <module>
platform = cl.get_platforms()[0]
pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR
I checked it ,found the reason is No ICDs were found.
I solved it !
Here is the result I tested :
Again, great work! I can obtain some decent results most of the time now. Occasionally, I ran into the problem with rough edges (zigzags) when the image resolution is low and the contrast is weak between the foreground and background. Do you have a good solution to propose? I am thinking of using some sort of anti-aliasing algorithm with Shader, but I have no clue as to where to start and how to do it in python. Any suggestions?
It might be sufficient to simply increase the number of iterations (n_small_iterations
, n_big_iterations
) and/or small_size
. The default number of iterations were sufficient for my test images. Can you share your images?
If increasing iterations does not help, a more elaborate upsampling method can be used. For example, one could use bilinear upsampling instead of nearest neighbor upsampling in the resize_nearest
function. Maybe I'll find a few minutes to implement that next week.
One of the authors of remove.bg
says they are using multiple u-nets, i.e. a form of neural network.
But I don't think that's all they do because the thickness of the alpha boundary is the same everywhere.
Maybe their neural networks are just generating a binary segmentation which they then follow up by an erosion and a blur operation?
Hi, Thanks for your work! It's very nice!
I used your approach to estimate foreground and background. But I find the time it costs is big. So I want to put it to GPU by Cupy or pytorch. And then I want to know
estimate_fb_ml()
function is base on which paper and "linear equation system" is what, and do you have some idea ?Thank you very much !