CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.9k stars 1.53k forks source link

How to train on custom dataset without text prompts and class conditionning #211

Open NicolasNerr opened 1 year ago

NicolasNerr commented 1 year ago

Hello,

I want to apply Latent Diffusion Model to medical images data. How can I feed training images from a directory of .jpg files to the train the diffusion model ? Plus, I don't want the model to be conditioned either on classes or on text.

I would love any advice on how to do that.

Thank you so much

PedestrianZQZ commented 1 year ago

I'm facing the same problem, have you found any solution?

Yoonho-Na commented 1 year ago

I haven't tried yet but you should check out the taming transformers repo there's a description about training in custom dataset. I think the overall framework is similar so it might be clue.

PedestrianZQZ commented 1 year ago

I haven't tried yet but you should check out the taming transformers repo there's a description about training in custom dataset. I think the overall framework is similar so it might be clue.

Thanks a lot!

Suimingzhe commented 1 year ago

If I want to train an unconditional ldm on my custom dataset(without any condition), is it the following steps?

  1. Train a VQGAN on custom dataset to get a well-trained autoencoder.
  2. Train a ldm with the pre-trained autoencoder.

Many pre-trained autoencoders (VQ and KL) have been provided. But I'm not sure if they can be used on custom dataset and skip the first stage and go straight to the second stage.

Yoonho-Na commented 1 year ago

@Suimingzhe I experienced many trial and error for training ldm with custom dataset and successfully trained LDMs now and Yes the steps are correct.

and in my opinion for the second question, that is going to depend on your dataset. If your data is significantly different with the pre-trained dataset, you won't achieve good performance. I think the best way to figure out is send your data to pre-trained AE model and see the reconstruction performance. If you are good with the pre-trained reconstruction performance, then you are good to go to the 2nd stage.

Suimingzhe commented 1 year ago

@Yoonho-Na Thanks for your advice! Could you please show me a simple example of how to check the reconstruction performance using a pre-trained VQGAN autoencoder? I will appreciate it !

Yoonho-Na commented 1 year ago

@Suimingzhe

  1. At first, simply just compare reconstructed images with the actual input visually. If the reconstructed image show pretty much difference(It means that the pre-trained AE is not able to well compress your data and reconstruct it), then just go train AE with your custom dataset.
  2. If input image and reconstructed image looks similar enough, compare with actual evaluation metric such as MSE, SSIM, LPIPS, ... . There are bunch of evaluation metrics for comparing two images.
  3. If you are satisfied with the evaluation metric results from 2nd step above, train LDM with that pre-trained AE. If not, train AE with your custom data.
Suimingzhe commented 1 year ago

@Yoonho-Na Thanks again. I will try it.

DongyangHuLi commented 1 year ago

@Suimingzhe I experienced many trial and error for training ldm with custom dataset and successfully trained LDMs now and Yes the steps are correct.

and in my opinion for the second question, that is going to depend on your dataset. If your data is significantly different with the pre-trained dataset, you won't achieve good performance. I think the best way to figure out is send your data to pre-trained AE model and see the reconstruction performance. If you are good with the pre-trained reconstruction performance, then you are good to go to the 2nd stage.

Thank you for your answer. I am interested in the details of your training. May I ask what is the images resolution of your data set you are training? How much gpu memory is required for training and for how long? As far as I know, the diffusion model requires a high memory overhead and requires long training. Could you please give me some detailed information? Looking forward to your reply!

Yoonho-Na commented 1 year ago

@DongyangHuLi resolution was 1x256x256. I don't remember the exact time consumed for training each AE and LDM. training time depends on the dataset, batch_size, learning rate, ... and even hardware. there are lots of parameters that affect training time. In my case, it took about few hours of training AE and a day for LDM. And for the memory, lots of parameters can affect also... Mine took about 4x60GB in 4 nivdia A100. You should just go try training your LDMs. LDM does the diffusion and denoising process in much lower latent dimension, so it won't use too much memory than you expect.

DongyangHuLi commented 1 year ago

@DongyangHuLi resolution was 1x256x256. I don't remember the exact time consumed for training each AE and LDM. training time depends on the dataset, batch_size, learning rate, ... and even hardware. there are lots of parameters that affect training time. In my case, it took about few hours of training AE and a day for LDM. And for the memory, lots of parameters can affect also... Mine took about 4x60GB in 4 nivdia A100. You should just go try training your LDMs. LDM does the diffusion and denoising process in much lower latent dimension, so it won't use too much memory than you expect.

Thank you!

zhangdan8962 commented 1 year ago

Hi @Yoonho-Na ,

I am wondering if you trained the LDM on custom dataset? If so, did you observe the model performs differently on training and testing data?

Yoonho-Na commented 1 year ago

@zhangdan8962 Yes I did and I didn't observe big difference. Did you modified the training step function? If you did you have to modify the validation step in the same way. This might be the reason.

zhangdan8962 commented 1 year ago

@Yoonho-Na No. I did not modify the training step and I am thinking it might be overfitting. I am not familiar with PyTorch lightning and do you know why there is no generated images from validation set during training?

ustczhouyu commented 1 year ago

Hello, I would like to ask the difference between unconditional LDM and conditional LDM. After the model is trained, is unconditional sampling generate image randomly, but not based on a given image? So, if I want to generate a normal image from a flawed image (without any annotations in the inference phase), should I use conditional LDM?

ustczhouyu commented 1 year ago

Hello, I would like to ask the difference between unconditional LDM and conditional LDM. After the model is trained, is unconditional sampling generate image randomly, but not based on a given image? So, if I want to generate a normal image from a flawed image (without any annotations in the inference phase), should I use conditional LDM? @Yoonho-Na @PedestrianZQZ @zhangdan8962 @Suimingzhe

Suimingzhe commented 1 year ago

Hello, I would like to ask the difference between unconditional LDM and conditional LDM. After the model is trained, is unconditional sampling generate image randomly, but not based on a given image? So, if I want to generate a normal image from a flawed image (without any annotations in the inference phase), should I use conditional LDM? @Yoonho-Na @PedestrianZQZ @zhangdan8962 @Suimingzhe

Conditional LDM meas that there exists conditions (such as text, label, imgs etc.) during the training and inference stage. For example, training a conditional LDM on CIFA10 or ImageNet dataset, the condition can be the label, which will be converted to the label embeddings and sent to the model. So I think you are right. If you want to generate a normal image, the flawed image should be the condition.

ustczhouyu commented 1 year ago

Hello, I would like to ask the difference between unconditional LDM and conditional LDM. After the model is trained, is unconditional sampling generate image randomly, but not based on a given image? So, if I want to generate a normal image from a flawed image (without any annotations in the inference phase), should I use conditional LDM? @Yoonho-Na @PedestrianZQZ @zhangdan8962 @Suimingzhe

Conditional LDM meas that there exists conditions (such as text, label, imgs etc.) during the training and inference stage. For example, training a conditional LDM on CIFA10 or ImageNet dataset, the condition can be the label, which will be converted to the label embeddings and sent to the model. So I think you are right. If you want to generate a normal image, the flawed image should be the condition.

Thanks for your reply, I have another question. My training set has normal and flawed images, labeled good and bad, and for flawed images, we also have mask tags for flawed region. For the test set, we only have normal and flawed images, and their labels good and bad. Because there are many training tasks, such as inpainting, semantic_synthesis, text2img, and so on. Which config file do you think I should use to train? @Suimingzhe

Suimingzhe commented 1 year ago

Hello, I would like to ask the difference between unconditional LDM and conditional LDM. After the model is trained, is unconditional sampling generate image randomly, but not based on a given image? So, if I want to generate a normal image from a flawed image (without any annotations in the inference phase), should I use conditional LDM? @Yoonho-Na @PedestrianZQZ @zhangdan8962 @Suimingzhe

Conditional LDM meas that there exists conditions (such as text, label, imgs etc.) during the training and inference stage. For example, training a conditional LDM on CIFA10 or ImageNet dataset, the condition can be the label, which will be converted to the label embeddings and sent to the model. So I think you are right. If you want to generate a normal image, the flawed image should be the condition.

Thanks for your reply, I have another question. My training set has normal and flawed images, labeled good and bad, and for flawed images, we also have mask tags for flawed region. For the test set, we only have normal and flawed images, and their labels good and bad. Because there are many training tasks, such as inpainting, semantic_synthesis, text2img, and so on. Which config file do you think I should use to train? @Suimingzhe

I do not think you can use these configs directly on your task. I haven't tried defect detection task combined with diffusion models. In fact nowdays there have been some papers that use diffusion models for defect detection. I think you can just implement their codes.

ustczhouyu commented 1 year ago

Thanks again for your reply and I will try. One last question, do you think my task belongs to the conditional model? @Suimingzhe

Suimingzhe commented 1 year ago

Thanks again for your reply and I will try. One last question, do you think my task belongs to the conditional model? @Suimingzhe

I am not clear about the pipeline using diffusion model for defect detection tasks, so I cann't give you a reply.

ustczhouyu commented 1 year ago

Thanks again for your reply and I will try. One last question, do you think my task belongs to the conditional model? @Suimingzhe

I am not clear about the pipeline using diffusion model for defect detection tasks, so I cann't give you a reply.

thank you very much

kerkathy commented 1 year ago

@Suimingzhe I experienced many trial and error for training ldm with custom dataset and successfully trained LDMs now and Yes the steps are correct.

and in my opinion for the second question, that is going to depend on your dataset. If your data is significantly different with the pre-trained dataset, you won't achieve good performance. I think the best way to figure out is send your data to pre-trained AE model and see the reconstruction performance. If you are good with the pre-trained reconstruction performance, then you are good to go to the 2nd stage.

Hi @Yoonho-Na ,

I get it that I should first train the AE until the results are satisfactory. However I'm a bit confused with the next step: when training LDM with the pre-trained AE, should the parameters of the AE be frozen so that only the weights of the LDM are trained? thanks!

nicolasfischoeder commented 1 year ago

hm, I think so

taustudent commented 1 year ago

@Suimingzhe

  1. At first, simply just compare reconstructed images with the actual input visually. If the reconstructed image show pretty much difference(It means that the pre-trained AE is not able to well compress your data and reconstruct it), then just go train AE with your custom dataset.
  2. If input image and reconstructed image looks similar enough, compare with actual evaluation metric such as MSE, SSIM, LPIPS, ... . There are bunch of evaluation metrics for comparing two images.
  3. If you are satisfied with the evaluation metric results from 2nd step above, train LDM with that pre-trained AE. If not, train AE with your custom data.

Hi you said that if the reconstructed images look good enough that mean that the AE is ready for the LDM training? I'm training AE, the reconstructed images looks good but the samples images looks bad. And another question, how you run thus evaluation metrics? it's something in this LDM repo? I only see loss & AE_loss in the training loop printing. thanks

Yoonho-Na commented 1 year ago

hi @taustudent

  1. You don't need to care about sampled images because the autoencoder doesn't map to certain gaussian distribution. The autoencoder here isn't trained to sample new realistic images from random gaussian vector (This is not variational AE). It just learns a mapping function from pixel space to latent space and vice versa. Since the actual generation is done by the diffusion model and the AE is just a mapping function from latent space to pixel space, you only have to care about reconstruction quality. Maybe showing sampled images is just for experimental cases moving the weights of KL divergence?

  2. I think just using AE_loss is OK but I used more other metrics (not in this repo) just to be clear about evaluating the AE.

taustudent commented 1 year ago

hi @taustudent

  1. You don't need to care about sampled images because the autoencoder doesn't map to certain gaussian distribution. The autoencoder here isn't trained to sample new realistic images from random gaussian vector (This is not variational AE). It just learns a mapping function from pixel space to latent space and vice versa. Since the actual generation is done by the diffusion model and the AE is just a mapping function from latent space to pixel space, you only have to care about reconstruction quality. Maybe showing sampled images is just for experimental cases moving the weights of KL divergence?
  2. I think just using AE_loss is OK but I used more other metrics (not in this repo) just to be clear about evaluating the AE.

Thanks! how you know what AE config file to use for your own dataset? there are 4 KL config and even more VQ configs in the repo. There are some guildlines?

Yoonho-Na commented 1 year ago

@taustudent There's no such guideline. I just referred to the config file that is close to my dataset. (resolution, etc.) you'll have to do some experiments changing some parameters in the config files for your own dataset.

taustudent commented 1 year ago

@taustudent There's no such guideline. I just referred to the config file that is close to my dataset. (resolution, etc.) you'll have to do some experiments changing some parameters in the config files for your own dataset.

thanks I see that there are config files both on ./configs/autoencoder and on ./models/first_stage_models, with little differences between them (also the same for the LDM configs). Do you know from which location I need to choose the config file?

hellohahaw commented 10 months ago

If I want to train an unconditional ldm on my custom dataset(without any condition), is it the following steps?

  1. Train a VQGAN on custom dataset to get a well-trained autoencoder.
  2. Train a ldm with the pre-trained autoencoder.

Many pre-trained autoencoders (VQ and KL) have been provided. But I'm not sure if they can be used on custom dataset and skip the first stage and go straight to the second stag

If I want to train an unconditional ldm on my custom dataset(without any condition), is it the following steps?

  1. Train a VQGAN on custom dataset to get a well-trained autoencoder.
  2. Train a ldm with the pre-trained autoencoder.

Many pre-trained autoencoders (VQ and KL) have been provided. But I'm not sure if they can be used on custom dataset and skip the first stage and go straight to the second stage.

If I want to train an unconditional ldm on my custom dataset,and I do some image augmentations on my dataset.I want to train ldm to generate the similar augmentation.What should I do?Thanks.

hellohahaw commented 10 months ago

您好,我想问一下无条件LDM和有条件LDM的区别。模型训练好后,无条件采样是否随机生成图像,而不是基于给定的图像?那么,如果我想从有缺陷的图像生成正常图像(在推理阶段没有任何注释),我应该使用条件LDM吗?@Yoonho-Na @PedestrianZQZ @zhangdan8962 @Suimingzhe

@ustczhouyu 请问您解决您的问题了吗?如果我想训练ldm对我的数据进行图像增强,我应该怎么做?

Suimingzhe commented 10 months ago

If I want to train an unconditional ldm on my custom dataset(without any condition), is it the following steps?

  1. Train a VQGAN on custom dataset to get a well-trained autoencoder.
  2. Train a ldm with the pre-trained autoencoder.

Many pre-trained autoencoders (VQ and KL) have been provided. But I'm not sure if they can be used on custom dataset and skip the first stage and go straight to the second stag

If I want to train an unconditional ldm on my custom dataset(without any condition), is it the following steps?

  1. Train a VQGAN on custom dataset to get a well-trained autoencoder.
  2. Train a ldm with the pre-trained autoencoder.

Many pre-trained autoencoders (VQ and KL) have been provided. But I'm not sure if they can be used on custom dataset and skip the first stage and go straight to the second stage.

If I want to train an unconditional ldm on my custom dataset,and I do some image augmentations on my dataset.I want to train ldm to generate the similar augmentation.What should I do?Thanks.

As mentioned above, first you need to check the reconstructed image with pretrained VAE, and if good, you just need to train an unconditional ldm on your data (about hyperparameters, you can follow the experiments on FFHQ in README). Finally, sampling the virtual data and train with your real data together in your task.

rutuja1409 commented 3 months ago

@Suimingzhe

  1. At first, simply just compare reconstructed images with the actual input visually. If the reconstructed image show pretty much difference(It means that the pre-trained AE is not able to well compress your data and reconstruct it), then just go train AE with your custom dataset.
  2. If input image and reconstructed image looks similar enough, compare with actual evaluation metric such as MSE, SSIM, LPIPS, ... . There are bunch of evaluation metrics for comparing two images.
  3. If you are satisfied with the evaluation metric results from 2nd step above, train LDM with that pre-trained AE. If not, train AE with your custom data.

Hello @Yoonho-Na Thank you for the detail script. Can you provide the training script to reconstruct the image from AE

ultiwinter commented 2 months ago

@Suimingzhe I experienced many trial and error for training ldm with custom dataset and successfully trained LDMs now and Yes the steps are correct.

and in my opinion for the second question, that is going to depend on your dataset. If your data is significantly different with the pre-trained dataset, you won't achieve good performance. I think the best way to figure out is send your data to pre-trained AE model and see the reconstruction performance. If you are good with the pre-trained reconstruction performance, then you are good to go to the 2nd stage.

Hello @Yoonho-Na and everyone

Thank you very much for putting out the outline there. Please, I am stuck in actually having my images put as an input for training. would you mind sharing how? and would it be possible using AutoencoderKL instead?

beiqiqi commented 17 hours ago

@DongyangHuLi resolution was 1x256x256. I don't remember the exact time consumed for training each AE and LDM. training time depends on the dataset, batch_size, learning rate, ... and even hardware. there are lots of parameters that affect training time. In my case, it took about few hours of training AE and a day for LDM. And for the memory, lots of parameters can affect also... Mine took about 4x60GB in 4 nivdia A100. You should just go try training your LDMs. LDM does the diffusion and denoising process in much lower latent dimension, so it won't use too much memory than you expect. Hello! Thank you for your help. Do you happen to remember your batch size? I’m now planning to train a VAE on a single A100 40GB GPU with image resolution of 1x2048x2048. What would the batch size be?