Closed max810 closed 2 years ago
The foreground is only equal to the source on regions where alpha = 1. But for semitransparent regions, it is not, because part of the original background will leak through. These regions are usually hair, silhouette, and motion blur.
But the foreground will be learned to be the same as the original pixels for all alpha > 0
, not just alpha = 1
, no?
No, it won't. The dataset provides ground truth foreground F
and alpha a
. We composite them to a background to synthesize a synthetic source input I = aF + (1-a)B
. The model predicts foreground F'
and alpha a'
. The loss on F'
is from ground truth F
.
Oh, so we use the properly-extracted foregrounds from the datasets and the model directly learns to remove the background in those situations you described (hair strands, motion blur, etc.). I missed that, sorry.
Thanks for the explanation!
Hello, First of all, good job with the paper! Nicely written and explains a lot of concepts pretty well. However, I am still a little puzzled about why is predicting foreground (or foreground residual in this case) necessary in the pipeline. Consider this example from the demo:
For the composition (the final step) - why do we use the pixels from upsampled foreground and not from the original image? They are supposed to be identical anyway, because we explicitly train the coarse foreground prediction to replicate the pixels from the original image (in the alpha mask region) (formula 2):
A possible answer is mentioned in Issue#19, but it's unclear to me what the "background color spill onto partial-opacity hairs and edges" looks like and how does foreground prediction branch mitigate this issue.
I would greatly appreciate an explanation and/or just a side-by-side comparison of 2 images (original vs predicted foreground).
Thank you in advance!