facebookresearch / stable_signature

Official implementation of the paper "The Stable Signature Rooting Watermarks in Latent Diffusion Models"
Other
369 stars 49 forks source link

finetuning ldm decoder: noisy output #22

Closed rahimentezari closed 2 months ago

rahimentezari commented 4 months ago

Hi I wanted to finetune the ldm decoder and have two problems

  1. there are some missing parameters in the finetune_ldm_decoder compared to https://justpaste.it/cse0x, for example lambda_mse": 0.5, "lambda_lpips": 1. Should we remove them from param list?
  2. As training goes, even after 6K iterations, (you use only 100 iters right?), I get noisy outputs: 6000_train_d0 6000_train_d0 6000_train_orig 6000_train_orig 6000_train_w 6000_train_w Here are my configs:

python finetune_ldm_decoder.py --num_keys 1 \ --ldm_config configs/v2-inference.yaml \ --ldm_ckpt v2-1_512-ema-pruned.ckpt \ --msg_decoder_path dec_48b.pth \ --decoder_depth 8 \ --decoder_channels 64 \ --loss_i "watson-vgg" \ --loss_w "bce" \ --lambda_i 0.2 \ --lambda_w 1.0 \ --optimizer "AdamW,lr=5e-4" \ --train_dir coco2014/train2014 \ --val_dir coco2014/test2014 \ --steps 10000 \ --warmup_steps 100 \ --batch_size 16

  1. If I want to change the decoder, to another one, can I still use the same hidden network trained? I give it a try with another decoder with z_channel=8 and I am getting noisy train_w images (purple images) Train [ 760/1000] eta: 0:01:45 iteration: 750.000000 (380.000000) loss: 0.190190 (0.414728) loss_w: 0.051896 (0.189214) loss_i: 0.692773 (1.127573) psnr: 25.542080 (inf) bit_acc_avg: 1.000000 (0.927698) word_acc_avg: 1.000000 (0.459921) lr: 0.000076 (0.000321) time: 0.428403 data: 0.000091 max mem: 42627
pierrefdz commented 4 months ago

Hi,

  1. yes
  2. 6000_train_d0 are images decoded from the original decoder (D_o) (this one is not changed during optim), so the issue is not in the fine-tuning I would say. Does it only happen after some fine-tuning steps? Can you try to to encode decode and see what the images look like?
  3. Yes, you should be able to switch decoders as they are independent of the extractor (in the paper, we did fine-tune other decoders, like the one used for inpainting or for SR, which differ from the original one).

You can also share the full logs and code to reproduce.