Open opooladz opened 6 months ago
Sorry for the late check
I think Torch XLA is supporting TPU pod training https://pytorch.org/xla/master/#running-spmd-on-tpu-pod https://github.com/pytorch/xla/blob/master/docs/pjrt.md#pods
It is hard for me to build a test code now. This code may be helpful to you. https://github.com/pytorch/xla/blob/46919a478fa6d4ba50ddbe9aa6e74343d1d650e0/test/test_train_mp_imagenet.py#L183
Hello I am wondering if we can get an extension to v4-32 TPUs. Also some from scratch training of LLMs on slimpajama. I belive this would be useful as this is one of the few pytorch XLA repos. There is also this repo.