High Quality Image - Githubissues

sulaimanvesal commented 1 year ago

Hi all,

Thank you for this amazing efforts to put together LDM and DDM.

I have one question, on how to improve the quality of LDM. I tried to train LDM but the quality of generated images using LDM is pretty low. I saw also on the tutorials that for brain MRI the quality of images aren't good.

Any insights on this would be great.

xiaoiker commented 1 year ago

Hi, in the tutorials the training samples are downsampled to low-quality ones. Besides, there are only trained for hundreds of epoches. So I think you just need to train the model with high-quality images, make the model biger and train the model long eough.

marksgraham commented 1 year ago

Hi @sulaimanvesal

You may also increase the quality of the latent model by using a perceptual loss and training adversarially. There is an example of how to train a VQ-GAN here.

sulaimanvesal commented 1 year ago

Thank you both for the suggestions. As @xiaoiker suggested, I tried with high image resolution (160x224x160) and deeper models for both autoencoder and unet-diff (inspired by the config file from Model Zoo, for brain MRI synthesis). Both models (autoencoder and unet) trained for 500 epochs each (took few days).

@marksgraham The autoencoder trained with perceptual loss.

I observed some improvements compared to initial attempts, but it's far behind the paper results. Perhaps, I need to increase the training samples further or maybe moving to 2D LDM model.

Warvito commented 1 year ago

Hi @sulaimanvesal , Thank you for your interest in using our package for your project. Thanks also for sharing the progress that you are having.

As @marksgraham mentioned, having a good "compression model" (VQ-GAN or AE) is very important. It will define the maximum quality you will have for your images. One way to evaluate how well your compression model is performing is by measuring the reconstruction quality with MS-SSIM. I comment on this process of evaluation in this post (https://medium.com/towards-data-science/generating-medical-images-with-monai-e03310aa35e6). I hope this could be helpful. Tunning the weights for the perceptual, L1/L2, and adversarial losses are crucial. As well as the learning rate for the adversarial component. Checking if the balancing between the discriminator and generator loss is very important to get the image's fine details (you can check it in the learning curves of both components, you cannot have the collapse of the discriminator in your training (the loss getting too low)). Checking the magnitude of the gradient could be helpful too.

Having enough capacity in your models (number of channels) and compressing to a bigger latent space helps to keep the details for the diffusion model to learn. Still, it can also make it harder for the second model. I recommend starting with a small training set, where you can perform quicker tests and evaluations and then gradually move to the larger dataset.

I hope these comments and materials help. We are planning soon share a tutorial for finetuning pre-trained models that I hope could be helpful too.

AbrahamGhavabesh commented 1 year ago

@Warvito Hi, I try your model for training Autoencoderkl and Diffusion your code: for my personal program. It won't work for higher resolutions than 6464(I test it on 256256). Is there any help to guide me??? Thanks a lot

sulaimanvesal commented 1 year ago

Hi @Warvito,

Thank you for providing detailed information. I was able to obtain some decent prostate MRI images using AutoencoderKL, but as you suggested, I replaced it with 3D VQ_VAE. The image reconstruction quality of VQ_VAE is far better than AutoencoderKL. I was hoping that the DiffusionUNet would generate higher quality images now, but despite the training curve indicating convergence, the output is just noise, as shown in the attached example.

Regarding implementation, I followed the VQ-VAE using perceptual loss and adversarial training closely. I also looked into LDM-1 from Rambach et al. (appendix) for hyperparameter tuning. I thought that perhaps the UNet diffusion model needed to be changed, so I trained a few models with different configurations, but during inference, the models only produced noise. However, this behavior was not observed with AutoencoderKL.

Any suggestion why VQ-VAE didn't work? do you have a tutorial for using 3D LDM with VQ-VAE rather than AutoencoderKL?

marksgraham commented 1 year ago

Hi @sulaimanvesal

Would you be able to share your training code for the LDM training off the 3D VQ-VAE?

sulaimanvesal commented 1 year ago

Hi @marksgraham

Here is the training code on my onedrive

sulaimanvesal commented 1 year ago

Any update on the 3D LDM+VQ-VAE code review? I am still thinking how to improve this further.

marksgraham commented 1 year ago

Any update on the 3D LDM+VQ-VAE code review? I am still thinking how to improve this further.

You need to grant me accesss to view the code - have requested.

marksgraham commented 1 year ago

Hi @sulaimanvesal

I've taken a look at the code and can't see anything glaringly wrong that would produce those samples, but I have some thoughts. I have successfully trained a VQVAE LDM on 3d data with MONAI Generative, so know it is possible!

Have you tried training a 3-layer VQVAE followed by a DDPM? In general, its easier to train a DDPM to sample correctly on a smaller latent space. The VQ-VAE recon quality won't be as good, but if you get some samples out that look like prostate MRIs it confirms there are no bugs in your overall pipeline, and getting it to work on a 2-layer VQ-VAE will just be a matter of tweaking things
I would recommend trying to train/sample with the default epsilon prediction type, for now. In my experience v_prediction can make sampling worse.
You're UNet is quite a bit bigger than the example here - might be easier to train if you reduce the num_channels/head_channels?

Project-MONAI / GenerativeModels

High Quality Image #374