fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

facebookresearch/fairscale #1165

Example of MOE

Hi, Thanks for the amazing repo! I want to use the MOE but cannot find an exmple. Is it possible to provide an tutorial/example to show how to use moe? For example, how to define main, training …

Juanhui28 updated 4 months ago
1
Lightning-AI/pytorch-lightning #19898

Fabric: Incorrect `num_replicas` (ddp/fsdp) when number of G…

### Bug description When running multi-node/multi-GPU training with different number of GPUs on each node, `Fabric` `ddp` and `fsdp` will have an incorrect `num_replicas` in `distributed_sampler_kwar…

shaibagon updated 3 months ago
2
facebookresearch/fairscale #815

failed loading state dict with used_shard_state=True

Hi, i'm trying to train fully sharded transformer. At the beginning, I started to train the model with use_shard_state=False, but it failed when tried to save the checkpoint, since there are several f…

shellysheynin updated 2 years ago
1
pytorch/xla #5307

About deepspeed support for "xla"

## ❓ Questions and Help [distributed support of deepspeed on xla] Hello, does deepspeed support distributed training for xla? If not, can you provide support in this regard?

zhuziaaa updated 1 year ago
1
facebookresearch/fairscale #908

Flaky Tests in CircleCI

Here is a list of flaky tests that we should fix in our next fix-a-thon. - test_shared_weight_mevo[optim_state-flat] - test_regnet[pytorch-flatten-mixed] - test_shared_weight_mevo[train-none] - …

anupambhatnagar updated 2 years ago
4
pytorch/pytorch #128696

[DDP] DDP bucket memory release during fwd step

### 🚀 The feature, motivation and pitch DDP bucket will always in GPU HBM,which size is same as the sum of module all weight gradients' size. In fwd stage and optimizer stage, this memory is wast…

lichenlu updated 3 months ago
5
grahampugh/jamf-upload #86

[Errno 2] No such file or directory: '/Volumes/JAMFShare/Pac…

Repost from Slack as requested: First my environment if it is relevant is Jamf cloud with a secondary local SMB FSDP. The MacBook running AutoPkg is an M1 running Ventura. I'm getting [Errno 2] …

lchsit updated 1 year ago
2
Lightning-AI/lit-llama #62

Model compilation support

With FSDP currently the code could not be run. If you try to add model compilation to the [training](https://github.com/Lightning-AI/lit-llama/blob/main/train.py) like: ``` ... fabric = L.Fabric…

ipoletaev updated 1 year ago
13
pytorch/pytorch #128798

`RuntimeError: invalid dtype for bias - should match query's…

### 🐛 Describe the bug I was trying to use torch.compile + FSDP + huggingface transformer. I was able to make it work on one GPU, however, on 8 A100 GPUs, I ran into the following errors. I made a re…

ByronHsu updated 6 days ago
3
pytorch/pytorch #86479

FSDP support to load DDP optim checkpoints

### 🚀 The feature, motivation and pitch FSDP optimizer checkpoint loading expects params to be keyed by FQN, but DDP saves checkpoints with param IDs. FSDP does provide `rekey_optim_state_dict` to…

rohan-varma updated 1 year ago
1

上一页 1...38 39 40 41 42 43 44...100 下一页

1000+ results for fsdp

1000+ results
for fsdp