THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Apache License 2.0
1.19k stars 65 forks source link

Strange training dynamics for ImageReward model. #64

Open bhattg opened 1 year ago

bhattg commented 1 year ago

Hi! I am trying to train a reward model, and I am confused why in the initial iterations of training the gradients are not changing and neither the loss is changing. Only after some steps does it suddenly change and then learning is completed.

Following is the attached learning dynamics. Screen Shot 2023-10-08 at 6 21 46 PM

learn01one commented 1 year ago

Hello, which version of python and cuda are you using? Thank you.

xujz18 commented 1 year ago

This is a very interesting discovery, and I believe it may be related to the learning rate schedule and warm-up settings, although there could be other factors worth exploring.

bhattg commented 1 year ago

Hello, sorry I couldn't get back with the question on python version 3.10.13 and CUDA 11.7

Experiment was run using torch 1.13.0

Regarding the learning dynamics, I am using the following

--fix_rate 0.7 --lr 1e-05 --lr-decay-style cosine --warmup 0.0 --batch_size 32 --accumulation_steps 1 --epochs 50