LS4GAN / uvcgan2

UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image Translation
https://arxiv.org/abs/2303.16280
Other
130 stars 21 forks source link

Required image format for getting FID and KID. #17

Open jonatelintelo opened 1 year ago

jonatelintelo commented 1 year ago

Hi,

I am trying to apply the standardized FID and KID scoring to my own dataset and generators. For this I have a question.

The evaluate_metrics(args) handles the real image and the generated image subdirectories and calculates the desired scores between all real and generated images in these subdirs. But in what format are these passed? (.jpg or something different?)

usert5432 commented 1 year ago

Hi @jonatelintelo,

Are you referring to this function? https://github.com/LS4GAN/uvcgan2/blob/f74160381048ed753f1740c99a12892eaa827f6f/scripts/eval_fid.py#L64-L74

If so, then this a wrapper around torch_fidelity package and it should support all of the image formats that torch_fidelity supports. When we evaluated the FID scores, we used png format though.

jonatelintelo commented 1 year ago

Hi @jonatelintelo,

Are you referring to this function?

https://github.com/LS4GAN/uvcgan2/blob/f74160381048ed753f1740c99a12892eaa827f6f/scripts/eval_fid.py#L64-L74

If so, then this a wrapper around torch_fidelity package and it should support all of the image formats that torch_fidelity supports. When we evaluated the FID scores, we used png format though.

Thank you for the quick answer. I indeed meant that function.

In this wrapper function from your code, is the input to calculate_metrics a singular image or a directory with images? If it is the latter, does that mean you called calculate_metrics for every attribute you compared such as anime to selfie and selfie to anime etc.

How many samples were there at minimum in each directory during your metric calculations?

usert5432 commented 1 year ago

In this wrapper function from your code, is the input to calculate_metrics a singular image or a directory with images? If it is the latter, does that mean you called calculate_metrics for every attribute you compared such as anime to selfie and selfie to anime etc.

FID scores are evaluated between directories of images. In the case of the Anime <-> Selfie translation we evaluate scores between the following directories:

  1. Real Anime Images vs Anime images obtained from selfies
  2. Real Selfie Images vs Selfie images obtained from anime images

How many samples were there at minimum in each directory during your metric calculations?

We used the entire test datasets for the evaluation. The smallest test dataset belongs to Anime2Selfie and it has just 100 anime and 100 selfie images.

jonatelintelo commented 1 year ago

Thank you for the answers, I can work further with this now.

We used the entire test datasets for the evaluation. The smallest test dataset belongs to Anime2Selfie and it has just 100 anime and 100 selfie images.

I also took a look at another FID implementation from pytorch-fid. For this implemenation the author recommends to use at least 2048 images due to the last pooling layer dimension in the inception network. Changing this might result in scores no longer correlating with visual quality. Do you know if the torch-fidelity has this same constraint and did you take this into account for the paper?

usert5432 commented 1 year ago

Hi @jonatelintelo,

Do you know if the torch-fidelity has this same constraint and did you take this into account for the paper?

No, unfortunately, I am not aware whether torch-fidelity makes such recommendations or not. And, we did not take torch-fid recommendation into account.