Closed yangjie1874 closed 2 days ago
Thank you for your interest in our work. Generating noise can be caused by many different reasons, one possible reason is the dataset too small or the iterations not enough that the model hasn't converged yet. When you training, are the images in the 'image_log' folder that start with 'samplesgs' also noise?
I followed your suggestion and found that the 'image_log' folder that starts with 'samplesgs' is not noise, but the subsequent images will have noise
‘samplesgs’ is the normal output of the model. If ‘samplesgs’ are meaningful images instead of noise, it means that the model already has the ability to generate images. What are the file names of the noisy "subsequent images"? Also, did you only train ldm and not controlnet? And which script did you use to generate the images?
Hello, the script I use is generate_data_for_target_domain.py,on my own dataset, I first use train_wheat.py to train the diffusion model, and then use generate_data_for_target_domain.py to generate images using the weights of the diffusion model. The configuration file is DODA_wheat_ldm_kl_4.yaml, ckpt_path: "coco-128.ckpt", this is my process
I thought that using ldm did not require controlnet training
yes using ldm not requires controlnet, and your process also looks good.
The images in the 'image_log' folder record the training process. The file names of the images represent how the images were obtained, and the number after 'gs' represents the global step. If your images look like the following examples, then the training process is correct.
"inputs_gs-200000_e-000069_b-002246"
"reconstruction_gs-200000_e-000069_b-002246", obtained by using VAE to encode and decode input image.
"progressive_row_gs-200000_e-000069_b-002246", strat form the noise, denoising images step by step
"samples_gs-200000_e-000069_b-002246", output images
For the generate_data_for_target_domain.py and the config file, did you make any changes?
Thank you very much for your patience,I trained on my own dataset,in your example, start with reconstruction-gs-xxx-xxx-xxx,subsequent image examples are all noise,For the generate_data_for_targetdomain.py and the config file,i just adjusted dataset path and my weights, i think the problem is train,
Is it possible that I didn't use pre-trained weights? :ckpt_path: "models/kl-f4-wheat.ckpt"
yes, ldm uses vae to encode image into latent space Sorry I didn't upload a separate vae before because the weights for this part are included in DODA-ldm, you can use the vae weight here: https://drive.google.com/file/d/1XHmtZR95uSbFcY-y6wCffgV5uUM1x8pC/view?usp=sharing, or train a vae by yourself using configs\autoencoder\DODA_wheat_autoencoder_kl_64x64x3.yaml as config.
thank so much!!!, by the way, do I need to re-train VAE on my own dataset?, or just use yours
In the original LDM paper (and early versions of stable diffusion), for different datasets, the VAEs used were all trained on OpenImage, so if your dataset is not very different from Global Wheat, you can use the pre-trained VAE, or finetuning the model
Hello, as shown in the figure, in the image generated by generate_data_for_target_domain.py, the label position I want is very random and almost wrong. I want to know the reason
I mean there is nothing wrong with the image, but the label doesn't match
That's a differnet question. please describe it in this issue #3
Hello author, I am amazed by your work. When I trained the generative model, I used the LDM method to train 12,000 416×416 images without adding pre-trained weights. When generating images, why are all the generated images noisy?