Link lovering data with rl4lm dataset class

in our RL toy task we want to use the Lovering et al. dataset for our prompts.

Lovering et. al code exports the datasets as .tsv files. RL4LM expects datasets of the kind: see their README

from rl4lms.data_pools.text_generation_pool import Sample, TextGenPool

class MyDataPool(TextGenPool):
   @classmethod
   def prepare(cls, split: str):
       .. 
       samples = []
       for ix, item in enumerate(..):
           sample = Sample(id=f"{split}_{ix}",
                           prompt_or_input_text=item["document"],
                           references=[item["target"]]
                           )
           samples.append(sample)
       pool_instance = cls(samples)
       return pool_instance

We need to make a class like so, that loads the .tsv and feeds them to the RL pipeline.

EdoardoPona / predicting-inductive-biases-RL

Link lovering data with rl4lm dataset class #5