ldm model question - Githubissues

yangjie1874 commented 6 days ago

Hello author, I am amazed by your work. When I trained the generative model, I used the LDM method to train 12,000 416×416 images without adding pre-trained weights. When generating images, why are all the generated images noisy?

illrayy commented 6 days ago

Thank you for your interest in our work. Generating noise can be caused by many different reasons, one possible reason is the dataset too small or the iterations not enough that the model hasn't converged yet. When you training, are the images in the 'image_log' folder that start with 'samplesgs' also noise?

yangjie1874 commented 5 days ago

I followed your suggestion and found that the 'image_log' folder that starts with 'samplesgs' is not noise, but the subsequent images will have noise

illrayy commented 5 days ago

‘samplesgs’ is the normal output of the model. If ‘samplesgs’ are meaningful images instead of noise, it means that the model already has the ability to generate images. What are the file names of the noisy "subsequent images"? Also, did you only train ldm and not controlnet? And which script did you use to generate the images?

yangjie1874 commented 5 days ago

Hello, the script I use is generate_data_for_target_domain.py，on my own dataset, I first use train_wheat.py to train the diffusion model, and then use generate_data_for_target_domain.py to generate images using the weights of the diffusion model. The configuration file is DODA_wheat_ldm_kl_4.yaml, ckpt_path: "coco-128.ckpt", this is my process

yangjie1874 commented 5 days ago

I thought that using ldm did not require controlnet training

illrayy commented 4 days ago

yes using ldm not requires controlnet, and your process also looks good.

The images in the 'image_log' folder record the training process. The file names of the images represent how the images were obtained, and the number after 'gs' represents the global step. If your images look like the following examples, then the training process is correct.

"inputs_gs-200000_e-000069_b-002246"

"reconstruction_gs-200000_e-000069_b-002246", obtained by using VAE to encode and decode input image.

reconstruction_gs-200000_e-000069_b-002246

"progressive_row_gs-200000_e-000069_b-002246", strat form the noise, denoising images step by step

progressive_row_gs-200000_e-000069_b-002246

"samples_gs-200000_e-000069_b-002246", output images

illrayy commented 4 days ago

For the generate_data_for_target_domain.py and the config file, did you make any changes?

yangjie1874 commented 4 days ago

Thank you very much for your patience，I trained on my own dataset，in your example， start with reconstruction-gs-xxx-xxx-xxx，subsequent image examples are all noise，For the generate_data_for_targetdomain.py and the config file，i just adjusted dataset path and my weights, i think the problem is train,

123

yangjie1874 commented 4 days ago

Is it possible that I didn't use pre-trained weights? :ckpt_path: "models/kl-f4-wheat.ckpt"

illrayy commented 4 days ago

yes, ldm uses vae to encode image into latent space Sorry I didn't upload a separate vae before because the weights for this part are included in DODA-ldm, you can use the vae weight here: https://drive.google.com/file/d/1XHmtZR95uSbFcY-y6wCffgV5uUM1x8pC/view?usp=sharing, or train a vae by yourself using configs\autoencoder\DODA_wheat_autoencoder_kl_64x64x3.yaml as config.

yangjie1874 commented 4 days ago

thank so much!!!, by the way, do I need to re-train VAE on my own dataset?, or just use yours

illrayy commented 4 days ago

In the original LDM paper (and early versions of stable diffusion), for different datasets, the VAEs used were all trained on OpenImage, so if your dataset is not very different from Global Wheat, you can use the pre-trained VAE, or finetuning the model

yangjie1874 commented 3 days ago

Hello, as shown in the figure, in the image generated by generate_data_for_target_domain.py, the label position I want is very random and almost wrong. I want to know the reason dce3b4ab28a1d22d611633314237da2d fc5746b9270ee68ed77d1b3334268fd1

yangjie1874 commented 2 days ago

I mean there is nothing wrong with the image, but the label doesn't match

illrayy commented 2 days ago

That's a differnet question. please describe it in this issue #3

UTokyo-FieldPhenomics-Lab / DODA

ldm model question #2