Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm
282 stars 29 forks source link

data loading timing and disk use #4

Open poedator opened 4 months ago

poedator commented 4 months ago

The dataset loading code is taking too long. It downloads whole huge datasets (70G wiki, etc) to use just a handful of examples. setting split="train[0:2000]") is not helping since slicing happens only after full download Suggestions: