How to correctly calculate the FID scores for variable-length word images?

ganji15 commented 4 years ago

Hi @rlit, thanks for sharing your amazing work. I have read your paper, but I am confused about the FID calculation in Table 1.

To calculate FID, each image is fed into the Inception-v3 CNN. After the calculation of the global average pooling, the output feature map is reduced to a feature vector ( typically with 2048 dimensions I guess).

In the paper ``Adversarial Generation of Handwritten Text Images Conditioned on Sequences'', they generated the fixed-sized word images. So it is clear to calculate FID in their cases since both real images and generated images seem to be padded into the same shapes.

However, your work (i.e., ScrabbleGAN) is able to generate variable-length word images. Therefore, I am confused about the FID calculation since both real images and fake images have different image widths.

My question is how to correctly calculate the FID scores for variable-length word images?

The possible solutions might be:

Padding all real images and generated images with a given maximum width?
Or just splitting each variable-length word into fixed-sized patches (e.g., each patch have a fixed size of 32x32), and then calculating FID scores on those patches?

I wonder how did you calculate the FID score in your work?

Any suggestions and discussions are greatly appreciated. Thanks in advance.

sharonFogel commented 4 years ago

Hi, Our approach to calculate the FID is option 1, we chose a maximum width, threw images with larger widths and padded images with shorter widths before calculating the FID. It is also worth mentioning that we used the same dictionary as the one in the dataset we compare to, to generate the images.

ganji15 commented 4 years ago

@sharonFogel Thanks very much for your answers, and I am clear now.

amzn / convolutional-handwriting-gan

How to correctly calculate the FID scores for variable-length word images? #6