Rayhane-mamah / Efficient-VDVAE

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"
https://arxiv.org/abs/2203.13751
MIT License
190 stars 23 forks source link

Incorrect Imagenet dataset #6

Open prafullasd opened 2 years ago

prafullasd commented 2 years ago

I downloaded the Imagenet dataset linked in this repo, and I think the dataset for it (50000 test images with labels, box downsampling) doesn't match the official Imagenet 32x32/64x64 versions used for NLL benchmarks (https://github.com/openai/vdvae/blob/main/setup_imagenet.sh, 49999 test images with no labels, can download from https://academictorrents.com/details/96816a530ee002254d29bf7a61c0c158d3dedc3b). Difference in downsampling method used during pre-processing will make the NLL's not comparable.

Rayhane-mamah commented 2 years ago

Hello @prafullasd

Thank you for the interest you show in this work and thanks for reaching out about his issue!

You bring up a very good point. We have had our skepticism about the results achieved on Imagenet datasets in our work when the NLL results were very different from the VDVAE baseline. The confusing part about all of this is that the Imagenet version used in NLL benchmarks used to be hosted on this website (as you pointed out by the VDVAE download script), and it seems that since the update of that website, the downsampled imagenet is now downsampled in a different method (this is the part we missed). To add to that confusion even more, some prior work also seems to use the incorrect Imagenet version.

For completeness and future reference:

NLL reported results on Imagenet will probably change when we use these two versions of the dataset. We will update these metrics as we re-do the experiments (will take some time). The general expectation is that our reported NLL should get closer to the NLL reported by VDVAE, which would make sense and match our findings on FFHQ 5-bits.

While noting this mistake is very important and will ensure full research correctness, it doesn't affect the main contribution of the paper much: "Efficient-VDVAE keeps comparable or better NLL performance with less memory and faster training". Nevertheless, being precise in reporting the results is very much desirable.

Thank you very much for pointing this out and helping improve the quality of our work! I will keep the issue open until we make our updates. If you find any other problems, please let us know, we appreciate this a lot!

Rayhane.