fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtitan #596

Gradient norm clipping with pipeline parallelism (PP)

Dear torchtitan team, I have a question regarding gradient norm clipping when using pipeline parallelism (PP) potentially combined with `FSDP/DP/TP`. For simplicity, let's assume each process/GPU h…

zijian-hu updated 1 week ago
5
pacman100/LLM-Workshop #18

Problem training with FSDP

When I am trying to train a model with FSDP, I am getting following error. *** TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union It is happening on this specific line …

agokrani updated 9 months ago
2
pytorch/pytorch #133761

Incorrect GPU management and deadlocks without torch.cuda.se…

## Issue description When I am running distributed and I simply set `CUDA_VISIBLE_DEVICES` in each rank: - Running `torch.distributed.barrier()` makes rank 1 occupy GPU memory on the GPU of rank 0…

nikonikolov updated 1 month ago
6
Lightning-AI/lightning-thunder #1179

improve shape accuracy in transform output by providing bett…

Some transforms, notably FSDP and TensorParallel ones, change shapes, but currently do not completely update them (it does for the linear that follows, but not for the activation etc.). We might con…

t-vi updated 2 weeks ago
5
lm-sys/FastChat #1737

FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` …

FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead

luohao123 updated 11 months ago
2
aws-samples/awsome-distributed-training #340

FSDP Training Job failing on Validation Step (Batch 500)

Running the script [3.test_cases/10.FSDP/1.distributed-training.sbatch](https://github.com/aws-samples/awsome-distributed-training/blob/main/3.test_cases/10.FSDP/1.distributed-training.sbatch) on 2 p5…

nghtm updated 1 month ago
1
pytorch/xla #7607

How to use spmd to support hybrid shard data parallelism？

## ❓ Questions and Help Fsdp can be well expressed by spmd, but hsdp seems to be unable to be expressed. Is there any way to express hsdp in spmd?

mars1248 updated 3 months ago
3
Sense-X/Co-DETR #167

How much VRAM is needed to finetune co_dino_5scale_vit_large…

I have 6 4090 GPUs (VRAM = 120GB). However, when I try to finetune the model, it shows "CUDA out of memory" error. How much VRAM is needed to train the ViT backbone model? I want to know how many GPU…

hee-dongdong updated 3 weeks ago
1
stanford-crfm/levanter #429

FSDP with an odd/weird number of GPUs.

Currently our FSDP implementation uses JAX's sharding stuff, which requires that the embed axis be divisible by the number of devices (or really data axis size) Usually this is fine, but recently @…

dlwh updated 4 months ago
3
pytorch/pytorch #108976

About FSDP

@svenstaro I would like to ask why my GPU memory usage will be lower than FSDP mode when I am using DP mode? model = DP(model) `model = FSDP(model,auto_wrap_policy=my_auto_wrap_policy, …

pokameng updated 1 year ago
6

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for fsdp

1000+ results
for fsdp