fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #132435

[Distributed Checkpointing] get_model_state_dict fails with …

### 🐛 Describe the bug I'm trying to use the DCP API, but am finding that models that own nn.Parameters are failing when there is more than one device. The following code will fail with ```raise…

kylesargent updated 2 months ago
1
pytorch/pytorch #112164

[Tracking] torch.compile + torch.distributed + set_grad_enab…

### 🐛 Describe the bug FSDP - [ ] FSDP, autocast (MosaicML Diffusers) https://github.com/pytorch/pytorch/issues/110797 - **Error raised:** aot_autograd, r.grad = self.meta_tensor - [ ] FSDP,…

jon-chuang updated 11 months ago
1
pytorch/pytorch #90465

[FSDP] Revisit meta device initialization

### Context Today, `FullyShardedDataParallel` (FSDP) supports meta device initialization via two paths, where the precondition is that the `module` passed to FSDP has some parameter on meta device: …

awgu updated 1 year ago
2
patrick-kidger/equinox #825

Native Model Parallellism in equinox

Oftentimes, one wants to do a more general `n`-way data parallelism, `m`-way model parallelism as helpfully explained in the official JAX [docs](https://jax.readthedocs.io/en/latest/notebooks/Distribu…

neel04 updated 1 month ago
9
pytorch/pytorch #90549

Abort called in FSDP tests

### 🐛 Describe the bug When running the test suite of PyTorch 1.12.1 I get (e.g.) ``` distributed/fsdp/test_fsdp_input failed! distributed/fsdp/test_fsdp_mixed_precision failed! ``` Tracing …

Flamefire updated 1 year ago
1
pytorch/pytorch #111623

Missing `ignored_param` when calling wrapper_cls (FSDP) recu…

https://github.com/pytorch/pytorch/blob/935f6977542affc0d16c66333a13d60dae6aa5fa/torch/distributed/fsdp/wrap.py#L561 When calling FSDP class recursively, `ignored_param` (which is supposed to be pa…

erhoo82 updated 11 months ago
5
pytorch/pytorch #112515

[Bug Report]FSDP: An error raises when loading FSDP distribu…

### 🐛 Describe the bug ### When we ignore modules with any trainable parameters in FSDP, an error occurs when we try to continue training after loading a distributed checkpoint for the optimizer. …

fufeisi updated 11 months ago
2
huggingface/diffusers #7897

text_to_image multi-gpu not working

We are training text_to_image on Google cloud platform, the jupyterlab instance has 2 GPUs (NVIDIA Tesla P100) with a total memory of 32GB (16GB each). I tried using accelerate for training the text_t…

Sunflower54 updated 4 weeks ago
6
pytorch/pytorch #77141

[FSDP] `ignored_modules` follow-ups

This issue is to track a few follow-ups regarding `ignored_modules`. 1. Users may want to ignore specific parameters or buffers within a module. How should we modify the API to accommodate this? 2…

awgu updated 2 years ago
1
pytorch/pytorch #64667

[Tools] Add FSDP logging data

Similar to DDP, we can add FSDP logging data API to expose FSDP internal states, performance metrics and meta infos. cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen …

zhaojuanmao updated 3 years ago
1

上一页 1...30 31 32 33 34 35 36...100 下一页

1000+ results for fsdp

1000+ results
for fsdp