-
Is there multigpu support ? Don't know how to set up without running a script
-
**Context**
Gradient norm clipping is a popular technique for stabilizing training, which requires computing the total norm with respect to the model's gradients. This involves a norm reduction acros…
awgu updated
3 months ago
-
bitsandbytes install successful,but error:
Error invalid configuration argument at line 117 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
![2023-12-12 17-15-21屏幕截图]…
-
### 🚀 The feature, motivation and pitch
As titled.
Could we implement op `aten::record_stream`?
cc @zhangxiaoli73
-
## Description:
In AutoGluon's multimodal framework, Distributed Data Parallel (DDP) is the primary strategy employed for leveraging multiple GPUs across most problem types. A known limitation of D…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
Hi, for example I am training a job using this [yaml](https://github.com/mosaicml/diffusion/blob/main/yamls/hydra-yamls/SD-2-base-512.yaml), how to do continue training if this job failed? Thanks.
viyjy updated
10 months ago
-
As a followup to https://github.com/pytorch/torchdynamo/issues/887 which worked with eager
## Repro
`pip install mosaicml`
```python
from torch.utils.data import DataLoader
from torchvision…
-
I am running out of memory on Tesla T4. I have 4 of them though and I usually use accelerator for multigpu setup. How can I use them for angle semantic similarity?
-
Hi,
Does the Shampoo implementation support HuggingFace's Accelerate library?
Can it be used in:
`model, optimizer, scheduler = accelerator.prepare(model, optimizer, scheduler)` ?
Thanks!