Closed LoveSiameseCat closed 1 year ago
Fun
When fg_bbox resizes the foreground and background, first resize the background to 256256, and then resize the foreground with the same relative aspect ratio. At this time, there is a high probability that the foreground cannot cover the 256256 area, so we need to fill the black background on both sides of the short side. The fg_bbox here refers to the bbox whose foreground after resize is in 256*256.
Trans represents the parameters of the radial transformation of the foreground image. First, the attention feature of the foreground object and a random vector are concated in the first dimension, and then passed through a regression network for generating predicted values for the transformation parameters. The generated prediction may be in any range, use torch.tanh for activation function processing to map it to the range [-1, 1], and then shift and scale it to the [0, 1] range. This is done to ensure that the transformation parameters have reasonable values for the subsequent image transformation process. Trans is achieved by an affine transformation applied when generating the blended image.
Thank you for your reply. Though I think i still get some confusions about the bbox after your explanation, I have found a suitable way to adaptive this replacement function to my project. Anyway, thank you for your response.
Thank you for your reply. Though I think i still get some confusions about the bbox after your explanation, I have found a suitable way to adaptive this replacement function to my project. Anyway, thank you for your response.
Hello, could you introduce that suitable way to adaptive the replacement function, I am also looking for it.
I am very curious about the usage of coordinates of the foreground image in func. Why need to this transformation? Indeed, we only have a background image and a foreground image with mask in practice. Using these inputs, we can get a composited image by blending with the predicted position parameters. I don't think we can get any information about the coordinates in practice. Do you think so?