[ICLR 2024] Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach Link: https://arxiv.org/abs/2401.15652
I met a problem when training on my single RTX 4090. The predicted target occurs some black sub-images when training after 36k steps. The learning rate is set to 5e-5 and batch size is 64.
Can you give me some advice?
I met a problem when training on my single RTX 4090. The predicted target occurs some black sub-images when training after 36k steps. The learning rate is set to 5e-5 and batch size is 64. Can you give me some advice?
predicted target
decode target
prime target
The training log is here: 2024-08-20 20:54:35,177 - train_ldm.py - autoencoder: pretrained_path: assets/stable-diffusion/autoencoder_kl.pth ckpt_root: workdir/flickr192_large/noise_pred_20240820_80004/ckpts config_name: flickr192_large dataset: embed_dim: 1024 grid_size: 12 name: flickr path: ./dataset/scenery/train_ori/ resolution: 192 hparams: noise_pred_20240820_80004 lr_scheduler: name: customized warmup_steps: 20000 mixed_precision: fp16 nnet: depth: 20 embed_dim: 1024 img_size: 24 in_chans: 4 mlp_ratio: 4 mlp_time_embed: false name: uvit num_classes: 1001 num_heads: 16 patch_size: 2 qkv_bias: false use_checkpoint: true optimizer: betas: !!python/tuple