How to improve image matting accuracy

zzmao commented 3 years ago

Hi,

Thanks for this great project. I tried your colab for image matting, it looks like the boundary is not clear enough for some inputs(also the one in Github readme).

Is there anyway to improve image matting accuracy?

ZHKKKe commented 3 years ago

Hi, thanks for your attention.

Can you share some failure cases with me? If the foreground color is very similar to the background, the blurred boundaries are reasonable. For the image in the Github README file, since we converted it to the GIF format, I think its quality is too low to get good results.

mosvlad commented 3 years ago

Standard Lena image

ZHKKKe commented 3 years ago

@mosvlad I am sorry that such a wrong output is make sense in our model due to (1) similar foreground and background color; (2) we have limited training data.

zzmao commented 3 years ago

Thanks for replying @ZHKKKe Looks like the project is target for video matting. Is there anything we can do to optimize image(portrait) matting? (Volunteer myself to work on this)

ZHKKKe commented 3 years ago

@zzmao The main problem of our current model is its relatively poor performance in portrait semantic estimation. I think one possible solution is to improve the performance of the backbone model, i.e., the MobileNetV2, in MODNet.

alan-ai-learner commented 3 years ago

@ZHKKKe What should be the approach to improve the performance of the backbone model, i.e., the MobileNetV2? and also can you tell me what type of data you used for training? Can you please tell me the approach to train on our own data set?

ZHKKKe commented 3 years ago

@alan-ai-learner Q1: What should be the approach to improve the performance of the backbone model, i.e., the MobileNetV2? You can replace the MobileNetV2 with a more powerful model, e.g., DeepLabV3+. Besides, you may need more labeled training data. You may be interested in the large labeled dataset that will be released soon by BackgroundMattingV2.

Q2: can you tell me what type of data you used for training? Each of our supervised training samples is a pair of (RGB image, labeled matte). The unlabeled samples used in our SOC adaptation are the RGB images.

Q3: Can you please tell me the approach to train on our own data set? Our training code will be released next month. The code will contain a template for implementing the new dataloader. It allows you to train on your own datasets. We will also provide a guideline on how to do this.

alan-ai-learner commented 3 years ago

Thanks, but in # BackgroundMattingV2 they are using a different approach, in # BackgroundMattingV2 we need to pass two images one with the subject and the other is without a subject (only background). But in MODnet we need to pass only one image. so how can we utilize the dataset?

ZHKKKe commented 3 years ago

@alan-ai-learner Yes. I think their dataset only consists of the labeled foregrounds. They use the images from other datasets, like COCO, to composite the training samples. Therefore, their dataset can be used to train MODNet (You only need to input the composited images for training, i.e., you do not need to input the separate background images).

newjavaer commented 3 years ago

@ZHKKKe Do you think there will be an improvement in accuracy by combining supervised training with the estimation of foreground color?

alan-ai-learner commented 3 years ago

@ZHKKKe got it

ZHKKKe commented 3 years ago

@newjavaer I think that might help, but I'm not sure about it. The lack of the labeled data is a more crucial problem. The current version of MODNet is mostly wrong with portrait semantic estimation, rather than detail prediction.

newjavaer commented 3 years ago

@ZHKKKe Why do you think there are more errors in semantic estimation? Except for the lack of data.

QuantumLiu commented 3 years ago

Many erros are caused by recongnizing cloth as a part of person. Maybe can be improved by giving some penalty during the training. I have collected some person matting data from several semantic datasets, and new CELEBA-MASK-HQ dataset, it may imporve the result.

newjavaer commented 3 years ago

@QuantumLiu By adding some background images as negative samples to the training?

ZHKKKe commented 3 years ago

@newjavaer Semantic estimation is a high-level vision task. It is much more difficult than detail prediciton (low-level vision).

@ZHKKKe Why do you think there are more errors in semantic estimation? Except for the lack of data.

ZHKKKe commented 3 years ago

@QuantumLiu Yes. If we use the data from semantic datasets to train the Low-Resolution Branch of MODNet, we should get a more stable results.

The solution proposed by @newjavaer is also great. We do not consider the negtive samples during training since it is a engineering problem.

ZHKKKe commented 3 years ago

Please feel free to reopen this question if you have any questions.

syfbme commented 3 years ago

Has anyone tried replacing Low-Resolution Branch backbone. How is the result? @QuantumLiu @mosvlad @zzmao cc @ZHKKKe

ZHKKKe commented 2 years ago

@syfbme The performance may be further improved. Please refer to https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting

ZHKKKe / MODNet

How to improve image matting accuracy #24