facebookresearch / stable_signature

Official implementation of the paper "The Stable Signature Rooting Watermarks in Latent Diffusion Models"
Other
367 stars 48 forks source link

Some problem in your hidden file #7

Closed asdcaszc closed 9 months ago

asdcaszc commented 10 months ago

Your loss function seems different in your code and paper. In main,py file, Line 338 --Line 356, the gradient descent process is not same with your paper. In addition, you use the optimizer two times in your code. loss = 0 Line 338 optimizer.zero_grad() Line 339 loss += params.lambda_w*loss_w + params.lambda_i*loss_i Line 351 optimizer.zero_grad() Line 256

Besides, I find your log codes don't show the average result but the last batch result. I run it in my server, and the result present as the following.

Train - Epoch: [0/410] [624/625] eta: 0:00:00 loss_w: 0.684652 (0.692138) loss_i: 0.016432 (0.011972) loss: 0.684652 (0.692138) psnr_avg: 31.326965 (32.762313) lr: 0.000031 (0.000031) bit_acc_avg: 0.550781 (0.518100) word_acc_avg: 0.000000 (0.000000) norm_avg: 1.958761 (1.096521) time: 0.251007 data: 0.001385 max mem: 8842 Averaged train stats: loss_w: 0.684652 (0.692138) loss_i: 0.016432 (0.011972) loss: 0.684652 (0.692138) psnr_avg: 31.326965 (32.762313) lr: 0.000031 (0.000031) bit_acc_avg: 0.550781 (0.518100) word_acc_avg: 0.000000 (0.000000) norm_avg: 1.958761 (1.096521)

Finally, when I train my own model using your recommended command in your hidden/readme file.

torchrun --nproc_per_node=8 main.py \ --val_dir path/to/coco/test2014/ --train_dir path/to/coco/train2014/ --output_dir output --eval_freq 5 \ --img_size 256 --num_bits 48 --batch_size 16 --epochs 300 \ --scheduler CosineLRScheduler,lr_min=1e-6,t_initial=300,warmup_lr_init=1e-6,warmup_t=5 --optimizer Lamb,lr=2e-2 \ --p_color_jitter 0.0 --p_blur 0.0 --p_rot 0.0 --p_crop 1.0 --p_res 1.0 --p_jpeg 1.0 \ --scaling_w 0.3 --scale_channels False --attenuation none \ --loss_w_type bce --loss_margin 1

The result is not good as yours. The bit accuracy cannot be above the 80%. The chosen dataset parameters are from original backbone paper --"HiDDeN: Hiding Data With Deep Networks". I know maybe our training set and test set is very different, because the dataset only contains the random 10000 images from coco training set and random 1000 images from coco test set. However, if we only have the difference in dataset, the accuracy should not decrease so much. Do you train the whole coco dataset rather than the random chosen 10000 images?

pierrefdz commented 10 months ago

Hi, thanks for pointing this out.

Your loss function seems [...] the gradient descent process is not same with your paper.

Can you give more details on how they defer?

In addition, you use the optimizer two times in your code.

This is due to initial versions of the code where I had gradient accumulation and had to initialize the loss before entering the epoch loop. I will remove it to make it clearer. It does not change anything with regards to optim though.

Besides, I find your log codes [...]

The average value is the value in parentheses. Otherwise the metric logger is taken from https://github.com/facebookresearch/dino/blob/main/utils.py, you may reach out to the authors if you want more information on it.

Finally, when I train my own model using your recommended [...]

I train on the whole COCO dataset, as mentioned in the paper.

asdcaszc commented 10 months ago

Thanks for your reply. I will modify it following your advice. As for the loss problem, loss += params.lambda_w*loss_w + params.lambda_i*loss_i Line 351 in hidden/main.py, you can see it accumulates the current loss with the last loss. In metric_logger.update(**{name:loss}) Line 380, it will update the loss but it will enter the next iteration and then add with the next loss. For example, in iteration 0, initial loss=0, loss+=params.lambda_w(1)loss_w(0.5) + params.lambda_i(0)loss_i(0.1)=0+0.5=0.5, in iteration 1, loss+=params.lambda_w(1)loss_w(0.4) + params.lambda_i(0)loss_i(0.1)=0.5+0.4=0.9.

I think your initial code is to do gradient accumulation in iteration and you forget to change it.

pierrefdz commented 10 months ago

Yes, I agree this is very ugly, thanks for noticing it! (but luckily, this does not impact anything since gradients are still the same thanks to the optimizer.zero_grad(), and since the loss is being recomputed just before logging...)

I will update this when I have the time, don't hesitate to reach out to me if you need more info/ if you find anything else.