YangRui2015 / RiC

Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
44 stars 4 forks source link

A question about Table 1 in the paper #6

Open CurryxIaoHu opened 2 months ago

CurryxIaoHu commented 2 months ago

Hello! Thanks for your interesting work. But I have a question, why the authors claim that MODPO is capable of Inference adaptation(in Table 1)? It seems like that MODPO doesn't allow users to modify their own preferences and the model's weight is also fixed. That confuses me a lot.

YangRui2015 commented 2 months ago

Sorry for the confusion. You are correct; MODPO is not adaptive. The original writing in MODPO was a bit unclear, and I realized this when I ran its code. However, I forgot to update the camera-ready version. I will correct this in the arXiv version.