THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Apache License 2.0
1.19k stars 65 forks source link

ReFL implement details #57

Open hkunzhe opened 1 year ago

hkunzhe commented 1 year ago

As mentioned in #24 and #34, the current ReFL code only the ReFL loss is implemented and the pre-training loss is not included. In addition, the two losses are optimized alternately.

I want to add pre-training data myself. If we don't use the gradient accumulation, the pseudo code would be like this:

# Given optimizer and lr_scheduler with unet.
# Compute Pre-training Loss `train_loss` with unet and update unet.
train_loss.backward()
optimizer.step()
lr_scheduler.step()  # is it necessary?
optimizer.zero_grad()

# Compute ReFL Loss `refl_loss` with unet and update unet.
refl_loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()

However, I'm confused about how to add accelerator.accumulate(unet) for gradient accumulation after reading this post. And I also raised the issue huggingface/accelerate#1870 and discussion in the huggingface accelerate github repo and forum. But I don't seem to get a clear answer. Can you give me some pseudo codes or hints? Thank you very much! @xujz18 @tongyx361

hkunzhe commented 1 year ago

May be duplicated with https://github.com/THUDM/ImageReward/issues/34#issuecomment-1687365733. I was afraid it wouldn't be seen in a closed issue, so I raised this new issue.

xujz18 commented 1 year ago

Your understanding is correct and I appreciate the discussion.