Wrong Inception Score Calculation

LukasDotEU commented 6 months ago

When running my own inception score calculation script I get a way higher Inception Score then your script is returning and that is reported in the paper. My script however gives around the same inception score when running on other models as reported from the authors of the other models. Also, your inception score is lower than the one reported in other papers that are using a GAN model, although your images are way better quality wise as the images generated by such a GAN model.

I haven't debugged your inception calculation script but I do think, that there is something wrong, as the inception score should be way better (between 25 and 35 according to my script, depending on the settings used in your model).

RomGai commented 5 months ago

OMG, thank you very much for your feedback! I will carefully check the code later. I apologize for being busy with other work recently. I am pleased to learn that the actual IS can outperform the ones we reported in the paper (although this might be due to errors in our code, really sorry for that).

Additionally, many works employing GANs have some methodological issues because they use the uncorrected original dataset released in 2017, which has been questioned for its potential design flaws. Experiments on that dataset have resulted in surprisingly high classification accuracies, indicating that the generated results are usually correct. Our method was initially trained and tested on that dataset as well. I remember this because during one test, the accuracy exceeded 90% or 95% (probably, I can't remember clearly), which made me question the validity of the experimental results. This prompted a thorough examination of the dataset and related literature, and we then used its revised version for scientificity.

Considering that IS measures image quality (how recognizable they are) and diversity (whether the number of generated images across different categories is balanced), their results might be very high. This is mainly because their method showed very high generation accuracy on the flawed dataset, resulting in almost all the generated images being correct, with a similar number of images generated across different categories (because the distribution of image categories in the test set is also relatively uniform).

RomGai commented 4 months ago

Thank you very much! I have uploaded a new script for calculating IS. Besides, I also uploaded a script for calculating FID.

LukasDotEU commented 4 months ago

Awesome, so what are the scores you are getting?

RomGai commented 4 months ago

Really better than I expected. The average IS is 30.9985, and the average FID is 126.6576.

RomGai / BrainVis

Wrong Inception Score Calculation #6