The latent channel (4,16,16) in paper is not same with that in the code (64,16,16)

gaozhihan / PreDiff

[NeurIPS 2023] Official implementation of "PreDiff: Precipitation Nowcasting with Latent Diffusion Models"

Apache License 2.0

95 stars 5 forks source link

The latent channel (4,16,16) in paper is not same with that in the code (64,16,16) #8

Open xiaochengfuhuo opened 10 months ago

xiaochengfuhuo commented 10 months ago

the latent_channels in scripts/vae/sevirlr/cfg.yaml is 64 but the latent_channels in the paper's Implementation Details is 4. Will it reduce the training time during the denoise when the latent_channels is set as 64? Because the original data is 128 128 = 64 16 * 16.

gaozhihan commented 10 months ago

Thank you for your question. We followed the default config in LDM to use latent_channels = 4 and reported the performance in our paper. We found that setting latent_channels = 64 gives a more robust model that is less sensitive to the optimization hyperparameters. In this repo, we set latent_channels = 64 for convenience to reproduce our results. We also release the corresponding pretrained weights for consistency.

xiaochengfuhuo commented 10 months ago

Thank you for your reply. It is helpful for me. How about the training and sampling time? Will latent_channels = 64 be much longer than latent_channels = 4?

gaozhihan commented 10 months ago

Thank you for your follow-up question. The computational costs are not bottlenecked by the hyperparameter latent_channels. In our VAE model, the channel dimensions always increase to 512, regardless of the value of latent_channels. Similarly, in our Earthformer-UNet always increase to 256 and 512. As such, the choices of latent_channels do not significantly impact the training time or inference time. According to our experiments, changing latent_channels to 64 instead of 4 did not even lead to a doubling of the computational costs.

xiaochengfuhuo commented 10 months ago

Thanks a lot. I got it.