bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels #334

Open CloudedLeopard17 opened 2 years ago

CloudedLeopard17 commented 2 years ago

Hi,

Thanks for the great work. I was able to do Inference on BLOOM 7.1 model on 24 GB GPU memory. Can we train the BLOOM models using tensor-Parallelism and efficient fused CUDA kernels? As I don't have access to high memory.

mayank31398 commented 2 years ago

I don't think you will be able to do this on 24GB GPU. I am guessing you are using a RTX 3090? You can give it a try.

CloudedLeopard17 commented 2 years ago

I am using 2x A5000 GPUs. I was able to train the T5 xl model using tensor-Parallelism.

mayank31398 commented 2 years ago

Did you use megatron? Or does deepspeed has support for tensor parallel?

CloudedLeopard17 commented 2 years ago

Deepspeed supports model parallelism (MP) to fit large models that would otherwise not fit in GPU memory.