results and evaluation for 512x352 images

ankanbhunia / PIDM

Person Image Synthesis via Denoising Diffusion Model (CVPR 2023)

https://ankanbhunia.github.io/PIDM

MIT License

483 stars 62 forks source link

results and evaluation for 512x352 images #40

Open ZihaoW123 opened 1 year ago

ZihaoW123 commented 1 year ago

Hi authors, your work is impressive. Thanks for sharing the code base.

However, I find the file "utils/metrics.py" is the evaluation code only for 256x176 images. And the FID calculated by "utils/metrics.py" seems to be incorrect.

It would greatly help the community if you could share 512x352 generated image results and the evaluation code for 512x352 images. Looking forward to your kind response.

ankanbhunia commented 1 year ago

Could you please explain your issue regarding the FID calculated by "utils/metrics.py"?

ZihaoW123 commented 1 year ago

It is right for 256x176 images to calculate the FID using "utils/metrics.py".

And to evaluate 512x352 images, I replace the cv2.resize(imread(str(fn)).astype(np.float32),(176, 256)) with cv2.resize(imread(str(fn)).astype(np.float32),(352, 512)) in "utils/metrics.py" .

I find that in InceptionV3, there is a resize function that resizes the image from 512x352 to 299x299. As follows: In order to scientifically test the 512x352 image quality, I think the resolution of the image should not be reduced during the evaluation.

So I want to know your script code for evaluating 512x352 images. Thanks.

ankanbhunia commented 1 year ago

The Inception network takes input of size (299,299). So we need to resize the images to this size before calculating FID.

You are right in the fact that for higher resolution images it does not quite make sense to reduce the dimension. However, this is a standard protocol and other papers also evaluate like this way.

ZihaoW123 commented 1 year ago

So I'm wondering if there is a difference between my code and yours for testing 512x352 images.

nicolasugrinovic commented 9 months ago

@ZihaoW123 could you match the results for 256x176 images obtained using the file "utils/metrics.py" with the tables in the paper?