Closed fanghenshaometeor closed 9 months ago
Hi,
When we conducted our experiments, we assumed the ImageNet LDM should use the ImageNet dataset only, while overlooking the detail of VQ-VAE training. I think Table 8 in their paper states the use of OpenImage to train VQ-VAE. It is indeed an issue to use such an LDM in OoD benchmarks.
However, I think this issue may not alter our conclusion with LDM because of the followings:
I think one way to resolve this is to train another LDM only with the ImageNet dataset. Sadly, we do not have the time and resources to do this.
Thank you for raising the issue. I will update the README later to highlight the issue and let other followers notice it.
Best, Ruiyuan
Thanks for your kind reply. It would be much better to add essential explanations in the README to remind other researchers on this issue. Rigorously, both the VQ-VAE and the latent diffusion models should be trained on the in-distribution data.
I agree. I have updated the readme to let others notice the issue with the pre-trained weight for LDM.
The experiments cover the Latent Diffusion Models. However, I would like to raise a question for a discussion.
The training of the LDMs includes 2 steps: Step1. Training a VQ-VAE; Step2. Training a diffusion model in the latent space encoded by the encoder of the VQ-VAE. Regarding to the released checkpoints by the LDM paper, the training of the VQ-VAE is on the OpenImage data set, and then the diffusion models are trained on the specific dataset, e.g., ImageNet. Therefore, the LDMs on ImageNet actually have an implicit prior on more knowledge than the ImageNet data set, since the LDMs are based the OpenImage-trained latent space.
Accordingly, in such an OoD detection task, it might not be appropriate to use such LDMs, since the in-distribution for training the LDMs actually covers more than the ImageNet distribution.