Closed EdoardoPona closed 1 year ago
in our RL toy task we want to use the Lovering et al. dataset for our prompts.
Lovering et. al code exports the datasets as .tsv files. RL4LM expects datasets of the kind: see their README
from rl4lms.data_pools.text_generation_pool import Sample, TextGenPool class MyDataPool(TextGenPool): @classmethod def prepare(cls, split: str): .. samples = [] for ix, item in enumerate(..): sample = Sample(id=f"{split}_{ix}", prompt_or_input_text=item["document"], references=[item["target"]] ) samples.append(sample) pool_instance = cls(samples) return pool_instance
We need to make a class like so, that loads the .tsv and feeds them to the RL pipeline.
@diogo-cruz how is this going?
in our RL toy task we want to use the Lovering et al. dataset for our prompts.
Lovering et. al code exports the datasets as .tsv files. RL4LM expects datasets of the kind: see their README
We need to make a class like so, that loads the .tsv and feeds them to the RL pipeline.