I deleted the instructions that does not use docker. Without a docker image capturing the dependencies, performance will inevitably change and we cannot ensure reproducibility.
Update train.sh to pass per host batch size to Hugging Face.
Mention the recommendation of using untwisted TPU topologies.
Link to a new docker image.
I deleted the instructions that does not use docker. Without a docker image capturing the dependencies, performance will inevitably change and we cannot ensure reproducibility.
Update train.sh to pass per host batch size to Hugging Face.
Mention the recommendation of using untwisted TPU topologies.