HeegyuKim / torch-xla-SPMD

Pytorch/XLA SPMD Test code in Google TPU
MIT License
19 stars 6 forks source link

Extend to v4-32 #2

Open opooladz opened 6 months ago

opooladz commented 6 months ago

Hello I am wondering if we can get an extension to v4-32 TPUs. Also some from scratch training of LLMs on slimpajama. I belive this would be useful as this is one of the few pytorch XLA repos. There is also this repo.

HeegyuKim commented 5 months ago

Sorry for the late check

I think Torch XLA is supporting TPU pod training https://pytorch.org/xla/master/#running-spmd-on-tpu-pod https://github.com/pytorch/xla/blob/master/docs/pjrt.md#pods

It is hard for me to build a test code now. This code may be helpful to you. https://github.com/pytorch/xla/blob/46919a478fa6d4ba50ddbe9aa6e74343d1d650e0/test/test_train_mp_imagenet.py#L183