-
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it.
* We want to switch from no sharding to naive model parameter…
-
### 🐛 Describe the bug
I'm trying to train LLaMA model with all linear layers + embeddings and head.
Whilst embeddings have no problems with FSDP over Liger, there always exceptions when [ lm_head…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…
-
Hello, I plan to use the student and teacher's weights of my DINOv2 model (I did the DINOv2 training with FSDP, 2 nodes and 16 GPUs in total (8 GPU per node)) for downstream use for a different distil…
-
### System Info
```Shell
- `Accelerate` version: 0.31.0
- Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35
- `accelerate` bash location:
- Python version: 3.10.12
- Numpy version: 1.2…
-
### System Info
trl, transformers: most recent on github
python 3.10.11
ubuntu 22
package versions:
```
accelerate==1.0.1
addict==2.4.0
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal…
-
repro:
```
CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh --float8.enable_float8_linear --float8.enable_fsdp_float8_all_gather --float8.scaling_type_weight "delayed" --metrics.lo…
-
```
7: [rank80]: urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)
```
Running FSDP example, 16 p5 nodes. The example w…
nghtm updated
1 month ago
-
I am running the full finetune distributed recipe, when setting `clip_grad_norm: 1.0` and `fsdp_cpu_offload: True`, it raises error
`RuntimeError: No backend type associated with device type cpu`
…
-
### Bug description
In `FSDPStrategy.save_checkpoint`, the `filepath` variable is transformed via
https://github.com/Lightning-AI/pytorch-lightning/blob/3627c5bfac704d44c0d055a2cdf6f3f9e3f9e8c1/src/…