SHI-Labs / FcF-Inpainting

[WACV 2023] Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand
https://praeclarumjj3.github.io/fcf-inpainting/
Other
176 stars 13 forks source link

New easy to use inpanting method with transformers #16

Closed mohammadrezanaderi4 closed 1 year ago

mohammadrezanaderi4 commented 1 year ago

Dear reasercher, please also consider checking our newly introduced face inpainting method to address the symmetry problems of general inpainting mehthods by using swin transformer and semantic aware discriminators. Our proposed method showed better results in terms of fid score and newly proposed metric which focus on the face symmetry compared to some of the state-of-the-art methods including lama. Our paper is availabe at: https://www.researchgate.net/publication/366984165_SFI-Swin_Symmetric_Face_Inpainting_with_Swin_Transformer_by_Distinctly_Learning_Face_Components_Distributions

The code also will be published in: https://github.com/mohammadrezanaderi4/SFI-Swin

yzhouas commented 1 year ago

Thanks! That looks interesting. Have you tried applying it to generic inpainting but not just face?

mohammadrezanaderi4 commented 1 year ago

No, I didn't apply it to generic inpainting tasks, but it seems to be interesting to apply it to those tasks. This will increase the network underestanding about overall grass, house, human, and etc. and may achives better results.

Ellohiye commented 1 year ago

No, I didn't apply it to generic inpainting tasks, but it seems to be interesting to apply it to those tasks. This will increase the network underestanding about overall grass, house, human, and etc. and may achives better results.

Author, have you retrained lama? The test results of the lama pre-trained model are much better than those in the paper, and the fid result on the wide mask reaches 5.4. You see that the data in your paper is a direct reference to the results in the lama supplementary material, but how do you ensure that your test set mask is consistent with lama?

mohammadrezanaderi4 commented 1 year ago

No, I didn't apply it to generic inpainting tasks, but it seems to be interesting to apply it to those tasks. This will increase the network underestanding about overall grass, house, human, and etc. and may achives better results.

Author, have you retrained lama? The test results of the lama pre-trained model are much better than those in the paper, and the fid result on the wide mask reaches 5.4. You see that the data in your paper is a direct reference to the results in the lama supplementary material, but how do you ensure that your test set mask is consistent with lama?

Because lama authors did not provide the exact train and test split, and as you said, this make a big gap if you choose different data splits in achieved results, We created some random splits and then retrained lama with their public code multiple times with these splits. Then, we choosed the splited set that give the closest results to the results that they had been proposed in their paper and sup materials and used it as train and test sets.

Ellohiye commented 1 year ago

I checked the division process of the celebahq training set/validation set/test set in the lama code, but the division of the training set and the validation set is somewhat random, but the test set is fixed. I have also trained lama many times (the training set and the verification set are randomly divided), and the result has nothing to do with the division of the data set. The FID value of my retrained lama (trained three times) on the wide mask has reached 5.6.

mohammadrezanaderi4 commented 1 year ago

I checked the division process of the celebahq training set/validation set/test set in the lama code, but the division of the training set and the validation set is somewhat random, but the test set is fixed. I have also trained lama many times (the training set and the verification set are randomly divided), and the result has nothing to do with the division of the data set. The FID value of my retrained lama (trained three times) on the wide mask has reached 5.6.

Maybe you are doing something wrong, because as I remember when we trained lama code, the results was changing each time but most of the times were close to the results of the lama paper (FID was about 6.9 on the test set). By test set I do not remember that I checked the validation set values or test values. In lama code, I think the validation and the test set have similar number of images and they used test set for visualization. Thus maybe they reported the validation FID values. They did not used the validation information during the training or validation-based lr scheduler and it seems they designed their model based on Places dataset thus the results of validation set also seems to be fair to be reported.

Ellohiye commented 1 year ago

What are your settings when training lama (batchsize and epoch settings)? You can take a look at the test results of lama in the FCF paper, which has reached a result of 5.4. The test result of my pre-training model with lama is also around 5.4. I trained the lama code many times, and the test result on the wide mask reached 5.7, which is also a good result.

mohammadrezanaderi4 commented 1 year ago

As you previously mentioned, the test results of the pretrained model of lama reaches different fid than what they presented in their paper and sup material. This means that they didn't use that test set to evaluate their work. FCF and you trained the lama model and test it with the proposed lama test set, but I found a train and val splits by training lama multiple times to get near values to the lama paper and use the obtained train and val split to evaluate my model. Is it clear?

mohammadrezanaderi4 commented 1 year ago

After all I also proposed another metric which I named it as Symmetry concentration score which calculate the focus of the inpaintor network on different face parts during inpainting a special face organ or half of the face. you can also checked this metric on the paper. The FID seems to have weaknesses to evaluate symmetry of the faces because of it's patch-base nature.

mohammadrezanaderi4 commented 1 year ago

As you can see other methods that FCF also compared the results with them, have totally different results in the lama paper, this is another witness that the lama used train and test sets are to totally different with the FCF paper.

yzhouas commented 1 year ago

Thanks for the discussions.

Evaluation of inpainting is always not quite clear when comparing different methods due to the randomness of data, and imperfection of the metrics.

We always fill in hole so FID/LPIPS/PSNR/SSIM may not be ideal metrics since they are based on the entire image. Some variations of FID implementation may also generate different values. The dataset size also matters. The randomness of masks, data split etc. all matters. As long as you keep those the same when you do the evaluation among all the methods you have compared, it should be fair enough. And a better visual quality won't lie. If the visual quality is not significantly better, the numbers are meaningless.

Given the complicated situations which are not revealed in most of the papers, a user study will be the fair way to evaluate the model. Another useful metric is something we proposed in ECCV last year.

Please check: https://arxiv.org/pdf/2208.03357.pdf https://github.com/owenzlz/PAL4Inpaint

Ellohiye commented 1 year ago

I checked the division process of the celebahq training set/validation set/test set in the lama code, but the division of the training set and the validation set is somewhat random, but the test set is fixed. I have also trained lama many times (the training set and the verification set are randomly divided), and the result has nothing to do with the division of the data set. The FID value of my retrained lama (trained three times) on the wide mask has reached 5.6.

Maybe you are doing something wrong, because as I remember when we trained lama code, the results was changing each time but most of the times were close to the results of the lama paper (FID was about 6.9 on the test set). By test set I do not remember that I checked the validation set values or test values. In lama code, I think the validation and the test set have similar number of images and they used test set for visualization. Thus maybe they reported the validation FID values. They did not used the validation information during the training or validation-based lr scheduler and it seems they designed their model based on Places dataset thus the results of validation set also seems to be fair to be reported.

(1) "By test set I do not remember that I checked the validation set values or test values. In lama code, I think the validation and the test set have similar number of images and they used test set for visualization. Thus maybe they reported the validation FID values”

I don't know if I understand what you mean correctly, maybe you are using the verification set for testing to achieve the indicators of the lama paper? (2) I tested immediately. I used the previously trained model on the wide mask of the previously divided validation set and test set for score evaluation. The result was a FID score of 5.9 on the validation set and a FID score of 5.9 on the test set. 6.0. Thank you for your patience in replying, thank you very much!

mohammadrezanaderi4 commented 1 year ago

@Ellohiye , yes this is what I exactly mean. If you change the train and verification set each time by random splitting you can get close in some cases to lama paper.

mohammadrezanaderi4 commented 1 year ago

@yzhouas Thanks for your suggestion, I will check out the mentioned work. Thanks both of you for your supports and questions. @yzhouas @Ellohiye

Ellohiye commented 1 year ago

I understand what you mean, but as I mentioned above, I tested it immediately, and it was still far from the indicators in the paper. I don't know why my training results are like this, I actually mentioned in the previous question, I trained lama many times (randomly split the training set and validation set), the worst FID value on the wide mask is 6.1, so it is not close to the value of the paper. I did not change the settings, batchsize=10, epoch=40, because these floating results made me feel very frustrated, which made my next work difficult, so I came here to see your question and discuss it with you this problem.

Ellohiye commented 1 year ago

@yzhouas Thanks for your suggestion. I read your paper before, excellent work!