deep-floyd / IF

Other
7.63k stars 495 forks source link

What is inpainting_mask in the use of Zero-shot Inpainting? #112

Open KeyaoZhao opened 1 year ago

KeyaoZhao commented 1 year ago

I wonder what is inpainting_mask in the use of Zero-shot Inpainting? We should mask the raw_pil_image first? And the model will inpaint the mask part? Thanks a lot!

phalexo commented 1 year ago

I think I have the same question. I would like to add objects to a preexisting photo scene. It seems onerous to have to define some mask, I'd want the added objects simply placed "organically" in the correct/plausible locations.

KeyaoZhao commented 1 year ago

I think I have the same question. I would like to add objects to a preexisting photo scene. It seems onerous to have to define some mask, I'd want the added objects simply placed "organically" in the correct/plausible locations.

I have tried to set the 'support_pil_img'=ori_img without mask and 'inpainting_mask' = One channel mask image. But the result of if_II_kwargs is totally the same as 'support_pil_img', the prompt has no influence on the output? So how should I fix this?

AnranXu commented 1 year ago

Hello, I also have the same problem. I have tried to make the shape of the mask to be [h,w], [h,w,3], and [1,h,w,3] but failed all the cases. Did you figure out what the data type and shape the mask should be?

KeyaoZhao commented 1 year ago

Hello, I also have the same problem. I have tried to make the shape of the mask to be [h,w], [h,w,3], and [1,h,w,3] but failed all the cases. Did you figure out what the data type and shape the mask should be?

I still have no idea how to have the same effect as the example inpainting. But if you want to add text to the image, you can try TextDiffuser.

AnranXu commented 1 year ago

Thanks. If I figure out how to make it, I will share it here.

pierrot-lc commented 1 year ago

Hello, I also have the same problem. I have tried to make the shape of the mask to be [h,w], [h,w,3], and [1,h,w,3] but failed all the cases. Did you figure out what the data type and shape the mask should be?

I managed to make it work after a deep look in the code. What you should provide is a mask of torch.FloatTensor shape [1, 3, h, w]. Set the mask values to 1 where you want the model to modify the image, and 0 where the model should leave the pixels untouched.

Now, in order for this solution to work properly, you'll need to apply the patch available in pull request #64 .

Furthermore, if your image has an aspect ratio that is not well-rounded, the shape of the generated image in the first stage may differ from the shape of the mask and support noise. To address this issue, I have proposed a fix in pull request #125 .