⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
This PR adds a new step to combine keys in a dict. The main use case I found was the following:
In a Pipeline I have a Task that generates instructions (say GenerateSentencePair), and I want more examples of the positives generated for diversity. With a new task, I generate more of those, so afterwards I want to combine the positive keys with the new extra instructions (say a list of instructions called positives or whatever). With this CombineKeys we can combine those keys:
from distilabel.steps import CombineKeys
combiner = CombineKeys(
keys=["queries", "multiple_queries"],
output_key="queries",
)
combiner.load()
result = next(
combiner.process(
[
{
"queries": "How are you?",
"multiple_queries": ["What's up?", "Everything ok?"]
}
],
)
)
# >>> result
# [{'queries': ['How are you?', "What's up?", 'Everything ok?']}]
Description
This PR adds a new step to combine keys in a dict. The main use case I found was the following:
In a
Pipeline
I have aTask
that generates instructions (sayGenerateSentencePair
), and I want more examples of thepositives
generated for diversity. With a new task, I generate more of those, so afterwards I want to combine thepositive
keys with the new extra instructions (say a list of instructions calledpositives
or whatever). With thisCombineKeys
we can combine those keys: