Problems about the metric calculate

Sainthousand commented 2 months ago

Hi, the work you have done is very interesting, but we have some problems when recalculating the metric you have given in the paper. We cannot get the same FID, KID in the PFD2Cityscapes task with what shown in the paper. The fake_frame we used is downloaded directly from the "FeaMGAN_PFD_to_CS_Crop352_Full". And we have tried calculate the FID between the "FeaMGAN_PFD_to_CS_Crop352_Full" and the Cityscapes validation set, between the "FeaMGAN_PFD_to_CS_Crop352_Full" and the Cityscapes train set. Unfortunately, all the FID cannot be the same as what you have provided. So, we want to know what the exactly setting of the source and target data when calculating FID, KID. Any way, thanks a lot! Your work is very impressive.

BonifazStuhr commented 2 months ago

Hi,

thank you for your interest in our work!

For Cityscapes, the FID and KID metric values of FeaMGAN_PFD_to_CS_Crop352_Full and all baselines are calculated with this script: https://github.com/BonifazStuhr/feamgan/blob/main/feamgan/eval/quickEval.py

We use all frames from the leftImg8bit_sequence_trainvaltest data of Cityscapes to calculate the FID and KID. leftImg8bit_sequence_trainvaltest contains 15000 images (235 sequences).

Thanks again for your interest in our work!

Sainthousand commented 2 months ago

Hi, thank you for your patient reply. This weekend we tried again.

Firstly, we calculate the FID and KID for gta2cityscapes with the quickEval.py

We got FID=40.869 and KID=29.441, which still have a little gap

The setting is:

source: infer gta frames with FeaMGAN_PFD_to_CS_Crop352_Full checkpoint, which has 19252 frames from gta train set, and its size is 957*526
target: cityscapes frames, which has 15000 images from cityscapes validation set, and its size is 2048*1024
all the other settings keep default in the quickEval.py

Secondly, we calculate the sKVD and cKVD for gta2cityscapes with the kvdEval.py

We got sKVD=9.38 and cKVD_AVG=11.96, which have a obvious gap

The setting is:

source: inferred gta frames with FeaMGAN_PFD_to_CS_Crop352_Full checkpoint, which has 19252 frames from gta train set, and its size is 957*526, with the same size and amount segmentation mask
target: cityscapes frames, which has 15000 images from cityscapes validation set, and its size is 957*526, with the same size and amount segmentation mask
every_x_steps: 1(sKVD), 4(cKVD)
metric_part: [vgg16_f_ll]
all the other settings keep default in the kvdEval.py

Could you find any problem in our settings? Thank you!

FID and KID:

sKVD:

cKVD:

BonifazStuhr commented 2 months ago

Hello,

We seem to be getting closer to the reported results for FID and KID. I still have a few things on my mind:

please make sure the Cityscapes images are saved in .png format (should be, but double check to be sure). The image format can affect the metric (as can various scaling operations on images).
quickEval.py uses a random seed when calculating FID and KID (distUtils.setRandomSeed(42, by_rank=True)), which can affect the metric a bit. You can change the seed to see if the difference is big enough to explain the remaining difference between the results. A small difference in the FID and KID values is to be expected as you are on a different machine.

There must be something for sKVD and cKVD as the results you get are much better than what we have reported.

A crucial aspect of both metrics is that they rely heavily on the segmentations of the inputs. We inferred the segmentations of PFD and Cityscapes using the mseg model. To infer the segmentations, we used Docker/Dockerfile_MSeg and the feamgan/datasetPreperation/createMSegSegmentation.py script (see the readme file for a small tutorial). Have you used mseg segmentations for these metrics?

BonifazStuhr / feamgan

Problems about the metric calculate #4