Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

t2i training problem #225

Closed sunmeng7 closed 1 year ago

sunmeng7 commented 1 year ago

image I train the diffusion stage using my own dataset, but the sampling results are above. How can I solve it?

My config as follows:

model:
  base_learning_rate: 1.0e-4
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.012
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: caption
    image_size: 32
    channels: 4
    cond_stage_trainable: false
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        use_fp16: False
        image_size: 256
        in_channels: 4
        out_channels: 4
        model_channels: 128
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 64
        use_spatial_transformer: True
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ckpt_path: 'logs/2023-03-23_autoencoder_kl_32x32x4/checkpoints/epoch=000028.ckpt'
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
      params:
        freeze: True
        layer: "penultimate"
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 4
    num_workers: 0
    wrap: false
    train:
      target: ldm.data.imagenet3.MutilVox
      params:
        size: 256
        split: "train"
    validation:
      target: ldm.data.imagenet3.MutilVox
      params:
        size: 256
        split: "valid"

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000 
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True
RicoJYang commented 1 year ago

image I train the diffusion stage using my own dataset, but the sampling results are above. How can I solve it?

My config as follows:

model:
  base_learning_rate: 1.0e-4
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.012
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: caption
    image_size: 32
    channels: 4
    cond_stage_trainable: false
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        use_fp16: False
        image_size: 256
        in_channels: 4
        out_channels: 4
        model_channels: 128
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 64
        use_spatial_transformer: True
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ckpt_path: 'logs/2023-03-23_autoencoder_kl_32x32x4/checkpoints/epoch=000028.ckpt'
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
      params:
        freeze: True
        layer: "penultimate"
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 4
    num_workers: 0
    wrap: false
    train:
      target: ldm.data.imagenet3.MutilVox
      params:
        size: 256
        split: "train"
    validation:
      target: ldm.data.imagenet3.MutilVox
      params:
        size: 256
        split: "valid"

lightning:
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 5000 
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True

Hello, can you share how to train sd2, for example, which py file was used?

sunmeng7 commented 1 year ago

@RicoJYang 也是 main.py 文件,follow latent diffusion 的工作运行 python main.py --base configs/latent-diffusion/.yaml -t --gpus 0,

我运行的是 nohup python main.py python main.py --base configs/autoencoder/ffhq_vq_taming.yaml -t --gpus 0, --num_workers 0 >> 0402_vq_ffhq.log 2>&1 &

RicoJYang commented 1 year ago

我看sd2并没有提供main.py文件,ldm的main.py并不能直接使用,请问要怎么改ldm的main.py或者哪里能找到这个main.py嘛,感谢!

sunmeng7 commented 1 year ago

我看sd2并没有提供main.py文件,ldm的main.py并不能直接使用,请问要怎么改ldm的main.py或者哪里能找到这个main.py嘛,感谢!

用 ldm 的 main 文件, config 文件相比之下好像就改了二阶段 cond_stage_config: 那里,运行方式和 ldm 一样

emily-swatchon commented 1 year ago

@sunmeng7 Hi, were you able to solve this issue? In my case, it seems like the first training step (of fine-tuning with my own dataset) gives me an output close to a random noise, which is not expected since I used the pretrained weights (v2-1_512-ema-pruned.ckpt) as a starting point.

sunmeng7 commented 1 year ago

@emily-swatchon Hello, does your 'first training step' mean 'first_stage_config'? I think 'v2-1_512-ema-pruned.ckpt' is the model for 'cond_stage_config'. You can use the '--resume' and add the path of 'v2-1_512-ema-pruned.ckpt', then the training begin from pretrained model.

hotelbread commented 1 year ago

which file did you use for training own your dataset? Although I currently have limited knowledge about Stable Diffusion, I am eager to study and learn more about it. What is ldm main.py? and how to customize that to train stable diffusion?

sunmeng7 commented 1 year ago

@hotelbread The construction of Stable Diffusion is similar with Latent Diffusion in my opinion.
So I train Stable Diffusion use the main file from Latent Diffusion. Then you need to make some changes according to the configuration file from Stable Diffusion. I actually ended up running the ldm model, so I don't know much more about the sdm model.

hotelbread commented 1 year ago

@hotelbread The construction of Stable Diffusion is similar with Latent Diffusion in my opinion. So I train Stable Diffusion use the main file from Latent Diffusion. Then you need to make some changes according to the configuration file from Stable Diffusion. I actually ended up running the ldm model, so I don't know much more about the sdm model.

Thanks for your answer.! Im gonna check this out. But would I get your your config file you used, when you train, and main.py? If you provide me with those files, they will serve as valuable learning resources for me during the training process.

seotae89@gmail.com ^^;;; Thank you