-
### Bug description
`FSDPStrategy.load_checkpoint` casts `checkpoint_path` to a `pathlib.Path` [here](https://github.com/Lightning-AI/lightning/blob/master/src/lightning/pytorch/strategies/fsdp.py#…
-
For distributed recipes, such as full_finetune_distributed, the gradients end up getting synchronized after each backward() pass instead of only once before the optimizer step. This results in signifi…
-
Thank you for open source such meaningful work!
I had some problems during training.
When training with --use_fsdp, the saved model is incomplete, and saved state_dict only contains partial visual t…
-
Are there any plans to support Gemma2 in the torchtitan? I tried to use torchtitan to finetune Gemma2 model, but stuck on the following problem: how to parallelize tied layer in Gemma2 model? Maybe so…
-
## Description
As a user of prompt tuning, I want to be able to leverage multiple GPUs at train time!
## Discussion
Extends https://github.com/caikit/caikit-nlp/issues/175 to leverage PyTorch…
-
I am interested your project. It is full of your work.
But i met this bug for this project, please help me! @jph00 @johnowhitaker @KeremTurgutlu @warner-benjamin @geronimi73
World size: 2
Downlo…
-
Thank you guys for your work!
i was using fsdp + qlora fine tuning llama3 70B on 8* A100 80G, and i encountered this error:
```shell
Traceback (most recent call last):
File "/mnt/209180/qis…
-
### 🐛 Describe the bug
I'm trying to follow the instructions to efficiently load Hugging Face models from [`torchtitan`'s docs for FSDP1 -> FSDP2: Meta-Device Initialization](https://github.com/pyt…
-
## Feature Request
Please support BF16 mixed-precision
## Additional context
Training with BF16 is usually more stable than fp16, which is very important when we want to train large models. Addit…
-
## 🐛 Bug
Related https://github.com/PyTorchLightning/pytorch-lightning/pull/6152
When wrapping the module twice in FSDP, because we introduce a `FlattenParamsWrapper` that contains all the param…