fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

facebookresearch/jepa #51

FSDP Support

This is a bit of a technical challenge and/or question. Both I-JEPA and V-JEPA use DDP and not FSDP. This puts an inherent cap on the size of models that are used, the size of the GPU memory. I'm …

andrew-bydlon updated 7 months ago
2
huggingface/diffusers #9497

Dreambooth Flux training error: RuntimeError: mat2 must be a…

### Describe the bug I run the training but get this error ### Reproduction Run `accelerate config` ``` compute_environment: LOCAL_MACHINE debug: true distributed_type: FSDP downcast_bf16: '…

kopyl updated 2 weeks ago
8
Lightning-AI/lightning-thunder #1129

remove jit(fsdp(model)) codepath

The old codepath is not composable with other transforms, does not offer gathering of state dicts as easily etc. Removing, of course depends on NVIDIA benchmarking not needing it. I think we (@crc…

t-vi updated 2 months ago
6
aws-neuron/aws-neuron-sdk #502

[torch-neuronx] FSDP support - Distributed Training on Trn1

[torch-neuronx] FSDP support - Distributed Training on Trn1

aws-rxgupta updated 3 weeks ago
3
NVIDIA/TransformerEngine #1188

FSDP: How to do all-gather using FP8?

FSDP2 supports all-gather using FP8: https://discuss.pytorch.org/t/distributed-w-torchtitan-enabling-float8-all-gather-in-fsdp2/209323 Wondering if we could do this directly using TransformerEngine …

vgoklani updated 1 month ago
2
Lightning-AI/pytorch-lightning #20385

FSDP with HYBRID_SHARD loss doesn't improve with more nodes

### Bug description When using the FSDP strategy with HYBRID SHARD set, the loss behaves as if only one node is training. When it is set to FULL_SHARD/etc the loss drops as expected when more nodes a…

zaptrem updated 1 week ago
1
NVIDIA/nccl #1473

Why tree algorithms are specifically targeted at All-Reduce?

I'm running nccl-test `all-reduce` between two nodes, and I've found that the tree algorithm performs much better than the ring algorithm. However, through reading the NCCL source code, I noticed tha…

jxh314 updated 1 month ago
1
pytorch/pytorch #138813

Strange recompilations on torch 2.5 + FSDP + UNet

### 🐛 Describe the bug Simple compilation of UNet model works fine, but FSDP-wrapped UNet gets recompiled on every block. In real setup cache-size limit is rapidly reached. Code: ``` import argp…

GLivshits updated 4 days ago
11
OpenFabrics/fsdp_docs #135

Need to update fsdp_setup scripts

When adding the NVMe drives, we changed what cards are installed in nodes 01 and 02, and also removed the bifurcating PCI-e card from the Mellanox cards in nodes 09 and 10. We need to update the mach…

dledford updated 2 months ago
4
NVIDIA/TransformerEngine #401

FSDP support

I was wondering if PyTorch's FullyShardedDataParallel (FSDP) is supported by TransformerEngine , especially if FP8 can work with FSDP. Thank you in advance.

yongyanrao updated 10 months ago
16

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for fsdp

1000+ results
for fsdp