I shared this example with my research group. One member recommended Hugging Face Accelerate as an alternative way to train Hugging Face models with multiple GPUs. I don't suggest reworking this example, but perhaps it be listed as an alternative somewhere in the FAQ or TODO list.
From #28