Closed dvsrepo closed 11 months ago
Hey here! Nice idea to add JudgeLM
, I think that we can have a separate judgelm.py
file with JudgeLMRating
for the moment and then explore whether there's a better solution, which probably there is, so lets explore that before the release!
cool! and let's rename to UltraFeedbackRating too?
@alvarobartt now I see the initial naming alignment, they are *Preference so: UltraFeedbackPreference, JudgeLMPreference 😃
IMO the most compliant naming as of now are probably UltraFeedbackTask
and JudgeLMTask
? WDYT? We can do UltraFeedbackPreference
and JudgeLMPreference
otherwise, but since those are already imported from distilabel.tasks.preference
maybe the "preference" part does not need to be present there?
Maybe even remove the *Task?
I'd keep it because we are assigning that to an arg named task
so I feel like it's more intuitive to just go with task=JudgeLMTask
rather than task=JudgeLM
, WDYT?
perfect
Hi!
We should include JudgeLM, I've been thinking about how to include it with regards our discussion about the class structure and how to include new approaches to highly similar tasks (e.g., preference).
So this issue is an open discussion with @alvarobartt and @gabrielmbmb to find the right balance (at least for this early release).
Here's the prompt template (untested), config and output:
judgelm.jinja: As you can see there's no rating list explaining what's a 1 and what's a 10.
PreferenceTask settings:
output I think they used a much simpler and clever way to generate the responses with much less tokens and faster (the ultrafeedback output is bloated).
Looking at this, we can't make this template work by reusing
MultRatingsTask
, because we need to rewrite the parse_output function. This meansMultRatingsTask
is not a good name.Even if I'm not a big fan of this approach, we might need to name them:
UltraFeedbackRating
andJudgeMLRating
? both implementing PreferenceTask.What do you think? Are there any other ways, naming, structure? Otherwise is fine to go this way for now.
.