bcmi / GracoNet-Object-Placement

Official code for ECCV2022 paper: Learning Object Placement via Dual-path Graph Completion
MIT License
73 stars 6 forks source link

Question about coordinates of the foreground image. #11

Closed LoveSiameseCat closed 1 year ago

LoveSiameseCat commented 1 year ago

I am very curious about the usage of coordinates of the foreground image in func . Why need to this transformation? Indeed, we only have a background image and a foreground image with mask in practice. Using these inputs, we can get a composited image by blending with the predicted position parameters. I don't think we can get any information about the coordinates in practice. Do you think so?

WhynotHAHA commented 1 year ago

Fun involves input parameters including bg_img, fg_img, fg_msk, fg_bbox, trans, and the function is to adjust the position and size of the foreground image in the composite image to make it blend with the background image more naturally. The generated fused image is used as the input of the Graph Completion Module (GCM) together with the background image and the foreground image.

When fg_bbox resizes the foreground and background, first resize the background to 256256, and then resize the foreground with the same relative aspect ratio. At this time, there is a high probability that the foreground cannot cover the 256256 area, so we need to fill the black background on both sides of the short side. The fg_bbox here refers to the bbox whose foreground after resize is in 256*256.

Trans represents the parameters of the radial transformation of the foreground image. First, the attention feature of the foreground object and a random vector are concated in the first dimension, and then passed through a regression network for generating predicted values for the transformation parameters. The generated prediction may be in any range, use torch.tanh for activation function processing to map it to the range [-1, 1], and then shift and scale it to the [0, 1] range. This is done to ensure that the transformation parameters have reasonable values for the subsequent image transformation process. Trans is achieved by an affine transformation applied when generating the blended image.

LoveSiameseCat commented 1 year ago

Thank you for your reply. Though I think i still get some confusions about the bbox after your explanation, I have found a suitable way to adaptive this replacement function to my project. Anyway, thank you for your response.

Nomination-NRB commented 1 year ago

Thank you for your reply. Though I think i still get some confusions about the bbox after your explanation, I have found a suitable way to adaptive this replacement function to my project. Anyway, thank you for your response.

Hello, could you introduce that suitable way to adaptive the replacement function, I am also looking for it.