-
Quote from paper
> Our use of FSDP for Llama 3 shards optimizer states and gradients, but for model shards we do not reshard after forward computation to avoid an extra all-gather communication durin…
-
When adding the NVMe drives, we changed what cards are installed in nodes 01 and 02, and also removed the bifurcating PCI-e card from the Mellanox cards in nodes 09 and 10. We need to update the mach…
-
Hello,
We converted the paxml checkpoint and resumed training with following config:
```
base_config: "base.yml"
tokenizer_path: "/dockerx/vocab/c4_en_301_5Mexp2_spm.model"
dataset_type: "tfds"
…
-
Hi, does mup support training with FSDP? I got a model training with DDP, but when switching to FSDP I get the following assertion.
`assert hasattr(self.weight, 'infshape'), (
AssertionError: Plea…
-
I expected a training configuration with per_device_train_batch_size=1 and gradient_accumulation_steps=32 to yield the same (or similar) result to per_device_train_batch_size=32 and gradient_accumulat…
-
Hi,
In the FSDP [docs](https://github.com/huggingface/accelerate/blob/main/docs/source/usage_guides/fsdp.md) it says:
> When using transformers `save_pretrained`, pass `state_dict=accelerator.ge…
-
Hi, thanks for sharing the amazing work.
I was wondering if you could provide more details on how to train OSM with FSDP.
I set args.fsdp to True and get the following error:
AttributeError: 'O…
-
### System Info
- `transformers` version: 4.42.0
- Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.35
- Python version: 3.9.19
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4…
-
Pytorch version too old for fused optimizer
```
llm-full-mp-gpus.0 [stderr] [rank0]: Traceback (most recent call last):
llm-full-mp-gpus.0 [stderr] [rank0]: File "/homes/delaunap/milabench/benc…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…