AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.53k stars 293 forks source link

Improve tfds perf in multihost env #862

Closed aireenmei closed 2 months ago

aireenmei commented 2 months ago

Have each host read a subset of data file in TFDS (already used by mlperf on mlperf/4.1 branch when dataset_type=c4_mlperf, credits to @ZhiyuLi-goog ) and document data input best practice