Closed DanBigioi closed 2 years ago
Just to answer my 4th question, I was making a mistake. Was calculating the mask_img like this: mask_img = img*(255. - mask) + mask
When instead I should have omitted that . after the 255 as it was completing screwing up my result, very stupid mistake hahaha.
Hi, thanks for your attention.
Awesome thank you for your answers, they all clear it up for me. With regards to your 4th answer, do you think something like this would do the trick?:
mask:
masked_image (please ignore the colour):
I didn't quite understand the use of this mask. Are you trying to recover the full image with the given lip outline and part of the face? But in any case, it should be possible to recover the face. It depends on how much you train.
Yup! Given the lip outline + part of the face, I want it to output the full face. Ideally when its trained, at the inference stage, I can choose any lip outline, for a particular face, and the network should generate the full face with what the lips generated based off the outline.
The application is to generate a set of facial landmarks given audio, and using these generated facial landmarks, to render a photorealistic video based off the new positions of the lips.
I understand it.
Got it! thanks so much for your help, will close the issue now 😄
Specifically,
I'm looking at line 58 in Palette-Image-to-Image-Diffusion-Models/data/dataset.py
I'm trying to set up my own custom mask function, to process my dataset, for a variation of the image2image cropping task, and this is what I have so far.
I have code that generates a custom mask like this for a given image. The mask has shape 2562563:
and a target image like this:
For the get_item method in data/dataset.py, I have some questions regarding the following lines:
img = self.tfs(self.loader(path)) mask = self.get_mask() cond_image = img(1. - mask) + masktorch.randn_like(img) mask_img = img*(1. - mask) + mask
1) what does that tfs method do, and is it necessary to use on my ground truth image? 2) For my own get mask function, my mask has shape (h,w,3). Does this need to be (h,w,1) instead to account for the "hole", and valid regions? If so, how do I work around this so that I can include information about the lip position in the mask like I have in image 1. 3) What does the cond_image calculation do and why is it done? 4) Why is the masked image calculated this way, and not by doing a bitwise and multiplication between the img and mask? If I try to do this using my own data, the result is messed up completely.
Additionally, I have 1 extra question. Because I want to go from masked image with a drawing of the lips -> gt image, is this more suited for an image colorization task??
Thanks!