cannot get the loss reduction effect like that in your log.txt

Vizzana commented 2 months ago

Dear Authors,

I am trying to reproduce the results mentioned in the Stable signature paper. In the training of the bits encoder and extractor in Hidden part, I cannot get the effect like the log result you provided.

Specificlly, I used all images in COCO2014' training dataset and COCO2024' validation dataset for training and test, and my commend is:
CUDA_VISIBLE_DEVICES=1,2 torchrun --nproc_per_node=2 main.py --val_dir datasets/coco_dataset/test2014 --train_dir datasets/coco_dataset/train2014 --output_dir output/output_bit/4 --eval_freq 5 --img_size 256 --num_bits 48 --batch_size 16 --epochs 300 --scheduler CosineLRScheduler,lr_min=1e-6,t_initial=300,warmup_lr_init=1e-6,warmup_t=5 --optimizer Lamb,lr=2e-2 --p_color_jitter 0.0 --p_blur 0.0 --p_rot 0.0 --p_crop 1.0 --p_res 1.0 --p_jpeg 1.0 --scaling_w 0.3 --scale_channels False --attenuation none --loss_w_type bce --loss_margin 1 --local_rank 0 --dist True --workers 6
just like the commend you provided in the README.md file.

My problem is I cannot get the loss reduction effect like that in your log.txt (https://dl.fbaipublicfiles.com/ssl_watermarking/logs_replicate.txt). The train_loss_w can not reduce as fast as you provided. It even cannot be reduced to 0.2 after training 210 epochs.

I don't know how to get the loss reduction such fast as your log shows.

I'am looking forward to your early reply.

pierrefdz commented 2 months ago

Hi, Can you check on your side that the log is similar with regards to lr? + Could you share your logs? The lr is rescaled depending on the world size so that may be the issue.

KhadgaA commented 2 months ago

I am also trying to reproduce the results, and I just checked my results. and Yes, there is a lr mis-scaling due to my world size, @pierrefdz I think you should write a disclaimer in the readme.

Vizzana commented 2 months ago

Hi, this is my log, with the training parameters as what I said.

log.txt

I compared the training_lr with yours. There is something different.

Maybe I need to adjust the muti-GPU training parameters? In fact, I am not such familiar with the muti-GPU training parameters with pytorch.

Vizzana commented 2 months ago

Solved，thank you.

jinganglang567 commented 1 month ago

@Vizzana sorry to bother you i met the same problem, could you tell me how to solve it

Vizzana commented 1 month ago

adjust the learning rate in your code, make it change like the trand shown in the log that the authors upload.

	胡齐齐

@. | ---- Replied Message ---- | From | @.> | | Date | 8/6/2024 16:01 | | To | @.> | | Cc | Hu @.> , @.***> | | Subject | Re: [facebookresearch/stable_signature] cannot get the loss reduction effect like that in your log.txt (Issue #24) |

@Vizzana sorry to bother you i met the same problem, could you tell me how to solve it

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jinganglang567 commented 1 month ago

thank you i will try @Vizzana

jinganglang567 commented 1 month ago

https://github.com/facebookresearch/stable_signature/issues/24#issuecomment-2233003736 sorry to bother you again, i used 4 gpus and my log is like yours ,i want to know how to choose the right the learning rate @Vizzana

facebookresearch / stable_signature

cannot get the loss reduction effect like that in your log.txt #24