Seanseattle / SMIS

Semantically Multi-modal Image Synthesis(CVPR 2020)
Other
322 stars 48 forks source link

question about results #6

Open Ha0Tang opened 4 years ago

Ha0Tang commented 4 years ago

Hi, why your results in Table 2 (cityspaces and ade) different from those from SPADE paper while you used the same dataset train/test splits?

Seanseattle commented 4 years ago

We used the pretrained models from SPADE Github and tested with the dataset's evaluation method. This means we ignored the "don't care label". For FID, we used the updated PyTorch-version FID, which got the same results as the official Tensorflow implementation. While the SPADE might use the older version. So some results may be different.

Ha0Tang commented 4 years ago

Results of SPADE paper on Cityscapes are 62.3 mIoU and 81.9 acc, but you reported 62.3 mIoU and 93.5 acc, why is mIoU the same, but acc is so different?

Results of SPADE paper on ADE20K are 38.5 mIoU and 79.9 acc, but you reported 42.0 mIoU and 81.4 acc, why both are so different?

jessemelpolio commented 4 years ago

We cannot use the method mentioned in the SPADE paper to get the same results as the ones provided in their paper. It seems that we are not the only one who met this problem. Please see https://github.com/NVlabs/SPADE/issues/39 and https://github.com/NVlabs/SPADE/issues/100 and https://github.com/xh-liu/CC-FPSE/issues/4.

It remains an unsolved problem and people are continuously getting different results from each other. In our case, we use the model (https://drive.google.com/file/d/12gvlTbMvUcJewQlSEaZdeb2CdOB-b8kQ/view) provided by SPADE to test the segmentation results on Cityscapes but we follow some users (https://github.com/xh-liu/CC-FPSE/issues/4#issuecomment-574221689) to ignore the ‘don’t care’ class. We believe ignoring such class is the main reason that our tested results are better, as also pointed in these issues. It could totally be a coincidence that the mIoU keeps the same to 62.3 as appeared in our testing and that in the SPADE paper. Our evaluation setting is applied to all models in Tab. 2, which at least makes the comparison rather fair in our end. If you could find how to solve this problem, we are very glad to hear it and give it a go and update our results if necessary.