fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

unslothai/unsloth #36

Multigpu

Is there multigpu support ? Don't know how to set up without running a script

drewskidang updated 1 month ago
10
pytorch/pytorch #133586

No way for low-overhead total norm in native PyTorch with la…

**Context** Gradient norm clipping is a popular technique for stabilizing training, which requires computing the total norm with respect to the model's gradients. This involves a norm reduction acros…

awgu updated 3 months ago
1
bitsandbytes-foundation/bitsandbytes #911

bitsandbytes error

bitsandbytes install successful,but error: Error invalid configuration argument at line 117 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu ![2023-12-12 17-15-21屏幕截图]…

lucheng07082221 updated 2 weeks ago
6
intel/torch-xpu-ops #1055

We need op record_stream that is widely used in DDP\FSDP

### 🚀 The feature, motivation and pitch As titled. Could we implement op `aten::record_stream`? cc @zhangxiaoli73

guangyey updated 3 weeks ago
1
autogluon/autogluon #4082

[AutoMM] Enhancing Multi-GPU Support in Multimodal Training …

## Description: In AutoGluon's multimodal framework, Distributed Data Parallel (DDP) is the primary strategy employed for leveraging multiple GPUs across most problem types. A known limitation of D…

FANGAreNotGnu updated 7 months ago
1
axolotl-ai-cloud/axolotl #1031

ValueError: Attempting to unscale FP16 gradients.

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…

hengjiUSTC updated 1 month ago
12
mosaicml/diffusion #105

How to do continue training when a job failed

Hi, for example I am training a job using this [yaml](https://github.com/mosaicml/diffusion/blob/main/yamls/hydra-yamls/SD-2-base-512.yaml), how to do continue training if this job failed? Thanks.

viyjy updated 10 months ago
1
pytorch/pytorch #93695

Composer inductor errors

As a followup to https://github.com/pytorch/torchdynamo/issues/887 which worked with eager ## Repro `pip install mosaicml` ```python from torch.utils.data import DataLoader from torchvision…

msaroufim updated 9 months ago
2
SeanLee97/AnglE #59

multi gpu use?

I am running out of memory on Tesla T4. I have 4 of them though and I usually use accelerator for multigpu setup. How can I use them for angle semantic similarity?

ganeshkrishnan1 updated 4 months ago
20
facebookresearch/optimizers #24

Using Shampoo with Accelerate and FSDP

Hi, Does the Shampoo implementation support HuggingFace's Accelerate library? Can it be used in: `model, optimizer, scheduler = accelerator.prepare(model, optimizer, scheduler)` ? Thanks!

kfirgoldberg updated 3 weeks ago
3

上一页 1...87 88 89 90 91 92 93...100 下一页

1000+ results for fsdp

1000+ results
for fsdp