fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Lightning-AI/lightning-thunder #474

OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-…

## 🐛 Bug Gemma-7b with FSDP zero3 trained on 2 nodes with 8 H100 each gives OOM error for BS = 2 for both `thunder_cudnn` and `thunder_inductor_cat_cudnn`. The same configuration works for `inducto…

mpatel31415 updated 1 month ago
11
Lightning-AI/lightning-thunder #1234

OOM with rematerialization when torch.compile works

## 🐛 Bug When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True The same configurations works for torch.compile. Might be related to [https://gi…

mpatel31415 updated 5 days ago
2
pytorch/pytorch #64394

[Prototype][RFC] PyTorch FullyShardedDataParallel(FSDP) API …

FairScale FullyShardedDataParallel (FSDP) API supports large model training and is being quickly adopted by internal and external users. The long term goal of upstreaming the API to PyTorch is to rele…

zhaojuanmao updated 2 years ago
2
mlflow/mlflow #13325

[FR] Add support for logging of FullyShardedDataParallel mod…

### Willingness to contribute No. I cannot contribute this feature at this time. ### Proposal Summary This feature request proposes to add support for logging FullyShardedDataParallel models …

kimminw00 updated 15 hours ago
4
facebookresearch/fairscale #966

FSDP unnecessarily clones buffers in state_dict()?

My understanding is that FSDP does not shard the model buffers, and as a result, unlike parameters which would be fred and go back to their sharded state after state_dict()/summon_full_params(), this …

rohan-varma updated 2 years ago
1
facebookresearch/fairscale #539

FSDP with optimizer groups differs from DDP

It's not obvious how one should instantiate an optimizer with groups after instantiating `FSDP`. The change in the linked PR #538 breaks the unittests. The examples/docs should either denote that …

sshleifer updated 2 years ago
2
facebookresearch/fairscale #689

FSDP: illegal memory access when flatten=True

When working on a model with FSDP wrapping, running into an illegal memory access crash and it went away with flatten=False. I will be debugging it.

min-xu-ai updated 3 years ago
13
facebookresearch/fairscale #696

[FSDP] Add store weight/grad norm option

## 🚀 Feature FSDP to offer the possibility to compute the norms of the weights and norms of the gradients on the fly, when the weights / gradients are available with an option like `compute_weight_…

QuentinDuval updated 3 years ago
6
huggingface/diffusers #7897

text_to_image multi-gpu not working

We are training text_to_image on Google cloud platform, the jupyterlab instance has 2 GPUs (NVIDIA Tesla P100) with a total memory of 32GB (16GB each). I tried using accelerate for training the text_t…

Sunflower54 updated 4 weeks ago
6
pytorch/pytorch #121071

[FSDP][torch._dynamo.compiled_autograd] Final callbacks can …

### 🐛 Describe the bug Hi, When using `torch.compile` and `torch._dynamo.compiled_autograd` to trace the FSDP model with the backward gradient hooks, the following error happened. According to t…

zejun-chen updated 4 weeks ago
13

上一页 1...31 32 33 34 35 36 37...100 下一页

1000+ results for fsdp

1000+ results
for fsdp