Integrating pytorch XLA when using multiple GPUs

Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

https://lightning.ai

Apache License 2.0

28.48k stars 3.39k forks source link

Integrating pytorch XLA when using multiple GPUs #16130

Open Mohamed-Dhouib opened 1 year ago

Mohamed-Dhouib commented 1 year ago

Description & Motivation

I've experienced with pytorch XLA using multitple NVIDIA A100 GPU and I observed that in most cases training is faster. So it would be really nice to have the option to use XLA for training in pytorch lightning.

The main advantage is faster training.

Additional context

Here is a code link : https://github.com/Dhouib-med/Test-XLA/blob/17e5b6bd6c77fffa67818462856277a57877ff3b/test_xla.py to train a simple CNN on the MNIST dataset using XLA (on 2 GPUS). The main parts where taken from https://github.com/pytorch/xla. This wheel needs to be installed along with adequate pytorch and torchvision versions (1.11 and 0.14) https://storage.googleapis.com/tpu-pytorch/wheels/cuda/112/torch_xla-1.13-cp37-cp37m-linux_x86_64.whl @justusschock

cc @borda @justusschock @awaelchli @carmocca @JackCaoG @steventk-g @Liyang90

carmocca commented 1 year ago

Let's do it! To be clear, this would be enabled with: Trainer(accelerator='cuda'|'gpu', strategy='xla')

justusschock commented 1 year ago

@carmocca I assume we could reuse a lot of our current xla-strategy for tpus.

carmocca commented 1 year ago

That would be part of the goal

awaelchli commented 1 year ago

I like it, and I think it won't even be that hard! The abstraction of strategy and accelerator are already in place and are meant to support exactly this kind of relationship between a communication layer (xla) and accelerator (gpu/tpu). The first step towards this will be to simply rename our TPUSpawnStrategy to XLAStrategy (which is what we already planned to do and have done so already in lightning_lite).

Borda commented 1 year ago

This is great! :rabbit:

qipengh commented 1 year ago

Hello，this is very wonderful work! I want to know when we can finish it that Trainer(accelerator='cuda'|'gpu', strategy='xla') can work normally.

awaelchli commented 1 year ago

@qipengh We haven't started working on it. The feature is up for grabs if you or anyone from the community has interest in contributing and testing it out.

carmocca commented 1 year ago

This should become very easy once we add support for XLA's PJRT runtime: https://github.com/pytorch/xla/blob/master/docs/pjrt.md#gpu

JackCaoG commented 1 year ago

FYI @Liyang90 has a pr to add PJRT support in https://github.com/Lightning-AI/lightning/pull/17352

carmocca commented 1 year ago

In addition, we need to land

CI setup on CPU and/or CUDA workflows
Connector support for the combination
Docs

stellarpower commented 4 months ago

Is there an example model on how to use XLA with (a single) CUDA GPU? The link above now 404s since it was posted, I am struggling to find one anywhere; currently everything I come across is for TPUs only.

Roughly how much work do folks think is still needed in order to implement this FR?