EleutherAI / the-pile

MIT License
1.44k stars 122 forks source link

Question regarding Shuffling #119

Open LeoXinhaoLee opened 6 months ago

LeoXinhaoLee commented 6 months ago

Hi, thank you very much for releasing this great dataset. I am wondering if the original PILE dataset (with 30 chunks) have already shuffled? Or do we still need to globally shuffle PILE before using it for pertaining? Thank you.

yuzc19 commented 5 months ago

Hi, @LeoXinhaoLee I am also curious about it. Are there any conclusions?