Open hkunzhe opened 1 year ago
May be duplicated with https://github.com/THUDM/ImageReward/issues/34#issuecomment-1687365733. I was afraid it wouldn't be seen in a closed issue, so I raised this new issue.
Your understanding is correct and I appreciate the discussion.
As mentioned in #24 and #34, the current ReFL code only the ReFL loss is implemented and the pre-training loss is not included. In addition, the two losses are optimized alternately.
I want to add pre-training data myself. If we don't use the gradient accumulation, the pseudo code would be like this:
However, I'm confused about how to add
accelerator.accumulate(unet)
for gradient accumulation after reading this post. And I also raised the issue huggingface/accelerate#1870 and discussion in the huggingface accelerate github repo and forum. But I don't seem to get a clear answer. Can you give me some pseudo codes or hints? Thank you very much! @xujz18 @tongyx361