fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #86479

FSDP support to load DDP optim checkpoints

### 🚀 The feature, motivation and pitch FSDP optimizer checkpoint loading expects params to be keyed by FQN, but DDP saves checkpoints with param IDs. FSDP does provide `rekey_optim_state_dict` to…

rohan-varma updated 1 year ago
1
pytorch/pytorch #128798

`RuntimeError: invalid dtype for bias - should match query's…

### 🐛 Describe the bug I was trying to use torch.compile + FSDP + huggingface transformer. I was able to make it work on one GPU, however, on 8 A100 GPUs, I ran into the following errors. I made a re…

ByronHsu updated 1 week ago
3
facebookresearch/metaseq #691

Deprecate convert_to_singleton

As noted in #689, convert_to_singleton doesn't produce statedicts with compatible keys (for some unknown reason). Since reshard_mp can do the same job, without the GPU node requirement of convert_t…

andrewPoulton updated 1 year ago
13
pytorch/pytorch #109666

FSDP: ShardedStateDict support for world_size = 1

### 🐛 Describe the bug When iterating with FSDP code, it's sometimes useful to set world_size = 1 to sanity check some things before launching larger job. However, this currently requires switching t…

rohan-varma updated 1 year ago
1
mlfoundations/open_clip #310

FDSP

https://pytorch.org/docs/stable/fsdp.html this should allow us to go to bigger model would be quite useful to look into

rom1504 updated 1 year ago
4
pytorch/pytorch #126310

FSDP `use_orig_params` + full sharding results in missing pa…

### 🐛 Describe the bug ## Description There appears to be a bug in the `FullyShardedDataParallel` (FSDP) wrapper in PyTorch when accessing the inner module's state dict with `use_orig_params=True`…

milocress updated 3 months ago
7
foundation-model-stack/fms-hf-tuning #80

bug: Boolean values are represented as strings in default fs…

## Describe the bug Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This doe…

kmehant updated 7 months ago
8
lm-sys/FastChat #1200

torch.distributed.elastic.multiprocessing.errors.ChildFailed…

parameters is ： torchrun --nproc_per_node=1 --master_port=20001 FastChat/fastchat/train/train_mem.py --model_name_or_path /home/wanghaikuan/vicuna-7b --data_path /home/wanghaikuan/chat/playg…

whk6688 updated 1 year ago
17
pytorch/pytorch #123070

CUDA out of memory still exist after using FSDP

### 🐛 Describe the bug I want to train a model on HPC using SLURM and accelerate to config FSDP. However, no matter how I change the configuration, It seems not to have much effect on CUDA memory u…

TuyetHan updated 6 months ago
1
pytorch/pytorch #104755

fsdp load model causing insufficient CPU memory

### 🚀 The feature, motivation and pitch when use fsdp, it need load model on cpu, but every process load which means it need 8 times cpu memory on a 8 GPU machine, causing insufficient CPU memory, is…

Nightbringers updated 1 year ago
3

上一页 1...39 40 41 42 43 44 45...100 下一页

1000+ results for fsdp

1000+ results
for fsdp