argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.93k stars 373 forks source link

[FEATURE] Create a task template for Preference Data Collection #4024

Closed dvsrepo closed 1 year ago

dvsrepo commented 1 year ago

Is your feature request related to a problem? Please describe. Hi! @kursathalat and @davidberenstein1957 doing a demo today, I've realized we don't have simple way to setup a preference dataset, besides the rg.FeedbackDataset.for_direct_preference_optimization which in my opinion is too "narrow" (e.g., you could use this for a reward model trainer or for llm evaluation) and doesn't use the ranking question.

I understand that we have task-focused templates but I'd like to have more general-purpose datasets too (we will discuss how as we go), but in this specific case suggest creating something like:

Describe the solution you'd like

# not the best name but we can brainstorm
ds = rg.FeedbackDataset.for_preference_collection(
   num_responses=3, # default is 2 
   ... # the rest can be the same as dpo
)

# FeedbackDataset(
#   fields=[
#       TextField(name="input", use_markdown=True),
#       TextField(name="context", use_markdown=True)
#       TextField(name="response-1", use_markdown=True),
#       TextField(name="response-2", use_markdown=True),
#       TextField(name="response-3", use_markdown=True),
#   ],
#   questions=[
#       RakingQuestion(name="preference", values=["response-1", "response-2", "response-3"])
#   ]
#   guidelines="<Guidelines for the task>",
# )
dvsrepo commented 1 year ago

Thinking about it, I'd recommend renaming/refactoring direct_preference_optimization to be a more general way to collect preference data.

for_direct_preference_optimization is linking the dataset to a specific algorithm rather than a task. An analogy would be having for_svm or for_naive_bayes instead of for_text_classification. We'll see more and more algorithms for preference_tuning or preference_optimization and DPO is just one. I feel like using preference_tuning or preference_optimization would also be too narrow because one key use case of preference data is evaluation, but happy to discuss (maybe preference_optimization is better than preference collection?)

davidberenstein1957 commented 1 year ago

https://github.com/argilla-io/argilla/blob/7c697ce54655194f33713c608d3dd79c68e23546/src/argilla/client/feedback/dataset/local/mixins.py#L688

davidberenstein1957 commented 1 year ago

@dvsrepo I think I forgot to add it to the docs 😅

davidberenstein1957 commented 1 year ago

I agree with you @dvsrepo, however, the difficulty is that it is an iterative process where some things are required=True others are required=False if you know what I mean. However, I agree we might simplify but additionally I'm afraid that some users might not intuitively distinguish between text2text and summarization/translation or preference_modelling and dpo/ppo. Hence. I added the specific scenario's and differences.

davidberenstein1957 commented 1 year ago

@kursathalat can you add the preference modelling one to the docs?

dvsrepo commented 1 year ago

@davidberenstein1957 I see, it's fine but I still think we should improve the preference modeling template to allow setting the desired number of responses (all required) and a ranking question instead of the rating one with two responses.

davidberenstein1957 commented 1 year ago

Yes, I opted for that but post-processing RankingQuestions is a bit unintuitive and me and @alvarobartt and I thought the binary preference tuning was more common currently. So, we can set it up if you think it is needed but we did make an evaluated choice not to do it in that way.

dvsrepo commented 1 year ago

For many RLHF use cases you want to collect rankings more than 2 responses (as in the instructgpt paper) as it will give you more chosen, rejected pairs. Also allowing for ties is important.

davidberenstein1957 commented 1 year ago

@kursathalat,