THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Apache License 2.0
1.16k stars 65 forks source link

Training ImageReward model on different budgets #50

Open bhattg opened 1 year ago

bhattg commented 1 year ago

Hi! The paper mentions that the training for the ImageReward model is not easy and is sensitive to hyperparameters. In the section about hyperparameters, it says --" We find that fixing 70% of transformer layers with a learning rate of 1e-5 and batch size of 64 can reach up to the best preference accuracy."

Is this for the 8k budget? Can we get the suitable hyperparams for the other budgets?

Secondly, which part of the code freezes the transformer layers? Thanks!

xujz18 commented 1 year ago

Thanks for your discussion! Firstly, the hyperparameters are for 8k budget (but the shuffle may differ, so it's worth trying a little bit different ones). Secondly, see https://github.com/THUDM/ImageReward/blob/main/train/src/ImageReward.py#L87-L99.

bhattg commented 1 year ago

Thank you very much!

bhattg commented 1 year ago

Hey will it be possible to provide the hyperparams for 1k and 4k settings as well? That will be very useful.

xujz18 commented 1 year ago

The 8k hyper-parameters should only need smaller adjustments to accommodate 1k/2k/4k.

bhattg commented 1 year ago

Thanks! In your experience which hyperparameters were the most sensitive? I will try to tune them.