-
### 🚀 The feature, motivation and pitch
FSDP optimizer checkpoint loading expects params to be keyed by FQN, but DDP saves checkpoints with param IDs.
FSDP does provide `rekey_optim_state_dict` to…
-
### 🐛 Describe the bug
I was trying to use torch.compile + FSDP + huggingface transformer. I was able to make it work on one GPU, however, on 8 A100 GPUs, I ran into the following errors. I made a re…
-
As noted in #689, convert_to_singleton doesn't produce statedicts with compatible keys (for some unknown reason).
Since reshard_mp can do the same job, without the GPU node requirement of convert_t…
-
### 🐛 Describe the bug
When iterating with FSDP code, it's sometimes useful to set world_size = 1 to sanity check some things before launching larger job. However, this currently requires switching t…
-
https://pytorch.org/docs/stable/fsdp.html
this should allow us to go to bigger model
would be quite useful to look into
-
### 🐛 Describe the bug
## Description
There appears to be a bug in the `FullyShardedDataParallel` (FSDP) wrapper in PyTorch when accessing the inner module's state dict with `use_orig_params=True`…
-
## Describe the bug
Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This doe…
-
parameters is :
torchrun --nproc_per_node=1 --master_port=20001 FastChat/fastchat/train/train_mem.py --model_name_or_path /home/wanghaikuan/vicuna-7b --data_path /home/wanghaikuan/chat/playg…
-
### 🐛 Describe the bug
I want to train a model on HPC using SLURM and accelerate to config FSDP. However, no matter how I change the configuration, It seems not to have much effect on CUDA memory u…
-
### 🚀 The feature, motivation and pitch
when use fsdp, it need load model on cpu, but every process load which means it need 8 times cpu memory on a 8 GPU machine, causing insufficient CPU memory, is…