TencentARC / CustomNet

Apache License 2.0
258 stars 9 forks source link

A question about the training data generation in sec 3.3 of the paper #4

Closed ericwudocomoi closed 4 months ago

ericwudocomoi commented 9 months ago

Thanks for the very nice work!

I have a question about how the training data was constructed: in section 3.3, you guys said

"To alleviate this problem, we propose a training data construction pipeline that is the reverse of the above-mentioned way, i.e., directly utilizing natural images as the target image and extracting objects from the image as the reference...".

My question is although this avoided the unnatural "copy and paste" of the object images into the background, how do you get a proper background image for training? (The natural image with object of interest segmented/extracted will leave a "hole", do you guys filled the hole with other inpainting methods?)

jiangyzy commented 9 months ago

The original natural image is taken as a target image during training. The input object image is come from the natural image, then we obtain a novel-view of the object which is fed into the network to generate the target image. The target is the original natural image without a hole.

ericwudocomoi commented 9 months ago

So the "Composition branch" in Fig.2 training data cannot be generated as described in sec.3.3 (since we will be missing background image, is that right?

jiangyzy commented 9 months ago

In composition branch, the natural image masked by a bounding box is taken as the background image condition. Filling the background with some inpainting methods (e.g. lama) is also an optional processing method we tried before.

ericwudocomoi commented 9 months ago

Clear enough, thank you!