I got a 600+G file, and then I used gpt-neox's dataloader to read the data, which was very slow. It takes about 6s to read 2048-length pieces of data. May I ask why?
I get a file onlu 386G.. "386G Jan 30 13:28 pile_0.87_deduped_text_document.bin"
And I didn't get the '*.idx' file, should we use the download idx file directly?
I followed readme:
I got a 600+G file, and then I used gpt-neox's dataloader to read the data, which was very slow. It takes about 6s to read 2048-length pieces of data. May I ask why?