-
Does DeepSpeed support Hybrid Parallelism? e.g. `data parallel + pipeline_parallel + tensor_parallel `.
Can you show me an example how to use these parallelisms together?
-
Hi @shawntan, great work on Scatter MoE. As newer models are scaling up in the number of parameters used, I wanted to ask a question about what you put in the README: *does not include any additional …
-
Is there any way to perform tensor parallelism across multiple nodes instead just in a single node? Any tips would be helpful!
BDHU updated
10 months ago
-
Scaling models requires they be trained in data-parallel, pipeline parallel, or tensor parallel regimes. The last two, being both "model parallel", require a single model to be shared across GPUs. Thi…
-
Arraymancer has become a key piece of Nim ecosystem. Unfortunately I do not have the time to develop it further for several reasons:
- family, birth of family member, death of hobby time.
- competin…
-
### 🐛 Describe the bug
```python
import torch
import os
os.environ['NCCL_DEBUG'] = 'WARN'
from torch import nn
from torch import distributed as dist
from torch.distributed.device_mesh import …
-
I don't see backward speedup using NATTEN, even with only half size as kernel size when calling na3d(). I'm not sure if it's as expected. Could anyone help to clarify or confirm? Thanks!
-
### Your current environment
```text
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC ve…
-
### Question
Say I have an cluster with 8 GPU but only 12G vram each, I can still train llava?
It seems that deepspeed can do a various of model parallelism (tensor parallelism, pipeline etc)
I won…
-
Hi, I want to run one LLM model using multiple machines.
On one node, I want to use tensor parallel to speedup.
Within multiple nodes, I want to use pipeline parallel.
Is this supported? If s…