The Autoencoding Variational Autoencoder (AVAE) with convolutional architecture

I was trying to replicate the results in the paper for AVAE trained with convolutional architecture, however the training results are really noisy for ColorMNIST, I meant the new samples generated from the decoder are really bad while the reconstruction of the original images are really good. I just replaced the current fully connected architecture network of encoder and decoder in the repository with convolutional architecture described in the appendix (VGG based architecture). Compared to the FID values reported in table 1, the value I got is really large. Are there any particular things to take care of when the architecture is changed in the current training pipeline, in terms of loss implementation for example (any hyper-parameter for different loss terms). Are there any plans in the near future to open source the convolutional based architecture training code as well or upload any model checkpoints? Any pointers here would be really helpful. Thanks in advance!

google-deepmind / deepmind-research

The Autoencoding Variational Autoencoder (AVAE) with convolutional architecture #336