Closed dvsrepo closed 6 months ago
I have prototyped a very basic/dirty code approach. Ideally we'd like to get inputs in a conversation format [{"content": ...m "role": "user"}, ...]
and loop through that in a jinja template to fill the conversation.
import re
import os
from typing import Any, Dict, List
from distilabel.tasks import TextGenerationTask
from distilabel.tasks.prompt import Prompt
from distilabel.pipeline import Pipeline
from dataclasses import dataclass
from distilabel.llm import OpenAILLM
from datasets import Dataset
multiturner_prompt = """Please read the following conversation between a USER and an AI ASSISTANT and write a follow up message question from the USER.
The follow up question from the user should be highly related to the previous interaction, direct, concise, logically sound, and sometimes challenging for the Assistant.
Avoid superflous text praising the response, giving thanks, and remember users don't waste words giving thanks but are rather very direct with AI assistants.
[USER]
{instruction}
[AI ASSISTANT]
{generation}
[USER]
"""
@dataclass
class MultiTurner(TextGenerationTask):
system_prompt: str = "You are exceptionally skilled at crafting highly interesting conversations and sometimes challenging conversations between a user and AI assistants"
def generate_prompt(self, input:Dict[str, str]) -> Prompt:
formatted_prompt = multiturner_prompt.format(instruction=input["instruction"], generation=input["generation"])
return Prompt(
system_prompt=self.system_prompt,
formatted_prompt=formatted_prompt
)
def parse_output(self, output: str) -> List[Dict[str, str]]:
return {"generations": output}
generator = OpenAILLM(
model="gpt-3.5-turbo",
task=MultiTurner(),
max_new_tokens=1024,
num_threads=4,
openai_api_key=os.getenv("OPENAI_API_KEY", None),
temperature=0.7
)
pipe = Pipeline(generator=generator)
from datasets import load_dataset
dataset = load_dataset("argilla/ultrafeedback-binarized-preferences-cleaned", split="train")
def generate_input(r):
return {
"input": {
"instruction": r["chosen"][0],
"generation": r["chosen"][1],
}
}
dataset = dataset.select(range(10)).map(generate_input)
This generates relatively good follow up questions (looking at a small sample)
Hi @dvsrepo! Thanks for the detailed issue 🤗 I have one doubt w.r.t. naming of the task, whenever you say multi-turn
you mean to allow the task to receive a list of assistant-human interactions and fill-in the next one in the sequence, or to chain those to generate N-turns from the one provided? Because the first one seems feasible and could be easily integrated anytime, but if the second one implies re-using the generated content to generate more and chain that sequentially N-times, that may be more complex with the current approach, but we can talk about it!
hi @alvarobartt this is discussed in open question 2.
I'd like to start with generating one more user message but the usage of this component is to build multi-turn datasets, even if it means running this pipeline several times or chained in combination with a response generation task.
The current approach is just visually show what I have in mind is not intended to cover open questions. As mentioned in 2. generating full multi-turns will have an impact on quality and will be more complex as you highlight
I don't care much about the name, for me multi-turn expresses the final utility of what can be achieved with this component, but we can change it FollowUp
or something like that.
This is my working hacky example with UltraFeedback:
If this task returns conversations in the OpenAI conversation format, we could chain this easier with the response generation pipeline and potentially with more rounds of follow up generation (to generate several turns). Something like
from datasets import load_dataset
import re
import os
from typing import Any, Dict, List
from distilabel.tasks import TextGenerationTask
from distilabel.tasks.prompt import Prompt
from distilabel.pipeline import Pipeline
from dataclasses import dataclass
from distilabel.llm import OpenAILLM
from datasets import Dataset
# I think we could define this as a jinja template
# and use the OpenAI chat format to render the conversation (containing arbitrary turns)
multiturner_prompt = """Please read the following conversation between a USER and an AI ASSISTANT and write a follow up message question from the USER.
The follow up question from the user should be highly related to the previous interaction, direct, concise, logically sound, and sometimes challenging for the Assistant.
Avoid superflous text praising the response, giving thanks, and remember users don't waste words giving thanks but are rather very direct with AI assistants.
[user]
{instruction}
[assistant]
{generation}
[user]
"""
@dataclass
class MultiTurner(TextGenerationTask):
system_prompt: str = "You are exceptionally skilled at crafting highly interesting conversations and sometimes challenging conversations between a user and AI assistants"
# if this would accept the OpenAI chat format it would be awesome
# so it's more chainable
def generate_prompt(self, input:Dict[str, str]) -> Prompt:
formatted_prompt = multiturner_prompt.format(instruction=input["instruction"], generation=input["generation"])
return Prompt(
system_prompt=self.system_prompt,
formatted_prompt=formatted_prompt
)
# should this return the OpenAI format too?
# with the follow up message as a new message
def parse_output(self, output: str) -> List[Dict[str, str]]:
return {"generations": output}
dataset = load_dataset("argilla/ultrafeedback-binarized-preferences-cleaned", split="train")
generator = OpenAILLM(
model="gpt-3.5-turbo",
task=MultiTurner(),
max_new_tokens=1024,
num_threads=8,
openai_api_key=os.getenv("OPENAI_API_KEY", None),
temperature=0.7
)
pipe = Pipeline(generator=generator)
def generate_input(r):
return {
"input": {
"instruction": r["chosen"][0],
"generation": r["chosen"][1],
}
}
dataset = dataset.shuffle().select(range(6000)).map(generate_input)
generated_ds = pipe.generate(dataset=dataset)
def make_input(r):
input = []
for message in r["chosen"]:
input.append(f"[{message['role']}]")
input.append(f"{message['content']}")
input.append(f"[user]\n{r['followup'][0]}")
input.append("[assistant]\n")
return {"input": "\n".join(input)}
# this is pretty useless, I don't know why I did it like this
# if we could leverage a standard format as output of the previous pipeline that would be cool
ds = ds.filter(lambda r: r["generations"] is not None).rename_columns({"generations":"followup"}).map(make_input)
def load_gpt3(task):
from distilabel.llm import OpenAILLM
return OpenAILLM(
model="gpt-3.5-turbo",
task=task,
openai_api_key=os.getenv("OPENAI_API_KEY"),
max_new_tokens=1024,
num_threads=8,
temperature=1.0
)
def load_gpt4(task):
from distilabel.llm import OpenAILLM
return OpenAILLM(
model="gpt-4",
task=task,
openai_api_key=os.getenv("OPENAI_API_KEY"),
max_new_tokens=1024,
num_threads=8,
temperature=1.0
)
generator = LLMPool(
[
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_gpt4),
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_gpt3),
]
)
pipeline = Pipeline(generator=generator)
generated = pipeline.generate(dataset=ds.select(range(100)), num_generations=2, batch_size=1)
we might want to add a tuned version of ultrafeedback to clearly indicate there's a chat history and the labeler should focus on the last response to the interaction.
from distilabel.tasks.preference.ultrafeedback import UltraFeedbackTask
from distilabel.llm.openai import OpenAILLM
task = UltraFeedbackTask.for_text_quality(
task_description="\n# General Response Quality and Accuracy Assessment\nEvaluate the assistant's outputs based on various criteria:\n1. **Correctness & Informativeness**: Does the output provide accurate and helpful information?\n2. **Honesty & Uncertainty**: How confidently does the assistant convey its information, and does it express uncertainty appropriately?\n3. **Truthfulness & Hallucination**: Does the assistant introduce misleading or fabricated details?\n4. **Instruction Following**: Does the assistant's output align with given instructions and the user's intent?\nYour role is to provide a holistic assessment considering all the above factors focusing only on the response to the last question of the [user]. Use the full conversation only for context but focus on rating what response is better and more appropriate. Even if they are both almost correct please highlight the differences in the rating and the rationale.\n\n**Scoring**: Rate outputs from 1 to 5 based on the overall quality, providing a single number not 5/5 or something similar, considering all aspects:\n"
)
labeller = OpenAILLM(
model="gpt-4",
task=task,
openai_api_key=os.getenv("OPENAI_API_KEY"),
max_new_tokens=1024,
num_threads=8
)
pref_pipe = Pipeline(
labeller=labeller
)
labelled2 = pref_pipe.generate(dataset=generated.select(range(5)), num_generations=2, batch_size=4)
@alvarobartt and @plaguss I have included my full steps for the PoC, tons of improvements, especially for defining inputs and outputs in a way that is easier to chain these pipelines.
Hi here! I've discussed about some potential improvements w.r.t. how the Prompt
is defined, and also w.r.t. defining responsibilities across the different classes, in order to move some LLM-specific stuff to the LLMs, while simplifying the Prompt
dataclass and providing some formatting helpers. That said, most likely we'll end up with chat or instruct formats and functions to prepare those.
Finally, regarding the variable naming and chaining we should define which are the pain points as of now, and what's a nice way to tackle those with minimal impact.
cc @gabrielmbmb
Description
A high impact task for distilabel is one that generates follow up turns or multi-turn dialogues (which then can be criticized/ranked
Given a conversation (or at least a prompt+response pair), the generator will generate a follow up message (from the user role).
Ideally the input would be a standard conversation/list of messages format (like the one we use in uf, zephyr, etc.). This format can be used to build the generator prompt and ask it to generate a follow up message.
This can be developed in parallel or before #130
Open questions:
Instructions
->MultiTurner
-> and then use our current pipelines:Response generator
-> Labeler.MultiTurner
task?