EleutherAI / the-pile

MIT License
1.44k stars 122 forks source link

tfds_pile #100

Open everks opened 1 year ago

everks commented 1 year ago

I manually download the pile dataset and try to use pile_tfds.py to create tensorflow dataset, and find the _read_fn of PileReader only add text into result when type of text is list, but the actually format is str? so maybe result['text'] = text should be outside the if statement.

if isinstance(text, list):
        text = self.para_joiner.join(text)
        result['text'] = text