-
**Feature Overview (aka. Goal Summary)**
Implement Intel Gaudi support in InstructLab project, so Gaudi 2 and Gaudi 3 can be used for SDG, evaluation, and training.
**Goals (aka. expected user out…
-
# Modifying parameters of FSDP-wrapped module by hand without summon_full_params context
## Issue description
I am training a large language module using FSDP.
I want to store EMA weights wh…
-
### 🐛 Describe the bug
Hi, I've changed pytorch's FSDP+TP example to HF-T5 model and run on 3 nodes with 2 GPUs (total 6 GPUs)
### commands
`NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT NCCL_IB_CUDA_SUPP…
-
I was able to train Llama3-8b model with Thunder for a few steps and then save it. However when I try to use later `litgpt generate` or `litgpt chat` with the saved checkpoint I get an error about si…
-
### Bug description
When writing the new FSDP guide for Trainer in #18326, I got suspiciously slow iteration speed when enabling CPU offload (see https://github.com/Lightning-AI/lightning/pull/18326#…
-
## 🚀 Feature
### Pitch
Port https://github.com/pytorch/pytorch/blob/c4a157086482899f0640d03292e5d2c9a6a3db68/torch/distributed/fsdp/fully_sharded_data_parallel.py#L1069-L1194 to work with Thunde…
-
In #3740, we added support for FullyShardedDataParallel, but limited implementation to that of Zero2, not Zero3. Zero3 results in substantial decreases of memory usage compared with Zero2 while bringi…
-
### System Info
```Shell
Please see
https://github.com/huggingface/peft/issues/484#issue-1718704717
```
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks…
-
Hey Team,
I'm trying to use FSDP1/2 with Float8InferenceLinear but seems have some issues (with torch 2.3.1+cu118). Do you suggestion to bump to higher version of torch and have a try or maybe use …
-
### 🐛 Describe the bug
When using FSDP sharded checkpointing, I am seeing unusually high CPU RAM usage. Specifically, let's say I am training a 7B parameter model and checkpoints are saved in `floa…