-
When I finetune llama7b:
```
# alpaca
torchrun --nproc_per_node=8 --master_port=29000 train.py \
--model_name_or_path .cache/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
latest llamafactory version
### Reproduction
I'm using the latest llamafactory version to run sft(qlora…
-
### 🐛 Describe the bug
with TP
![Screenshot 2024-10-08 at 6 41 21 AM](https://github.com/user-attachments/assets/31421762-5742-4184-a52f-36a5de388eaf)
without TP
![Screenshot 2024-10-08 at 4 19 …
-
### What piece of documentation is affected?
https://github.com/OpenAccess-AI-Collective/axolotl/tree/main/examples/mistral
### What part(s) of the article would you like to see updated?
There's FS…
-
### 🚀 The feature, motivation and pitch
I created a tensor of shape (2, N) in a module
wrap it with FSDP on 2 GPUs
following are the shapes:
GPU 0, 1 -> (1, N)
GPU 2 to 7 -> (0, N)
this is a b…
-
I was wondering if PyTorch's FullyShardedDataParallel (FSDP) is supported by TransformerEngine , especially if FP8 can work with FSDP. Thank you in advance.
-
### Describe the bug
i'm using the train_dreambooth_flux.py to finetune flux. i get oom on 4x A100 80gb with deepspeed stage 2, gradient checkpoint, bf16 mixed precision, 1024px *1024px input, adafac…
-
### System Info
```Shell
Custom sdxl training script using fsdp GRAD_SHARD_OP with cpu offset.
After upgrading accelerate from 33.0 to 34.0, after collecting state_dict with accelerator.get_state_…
-
## ❓ Questions and Help
I have noticed during testing that enabling FSDP's flatten_parameter=True results in a significant increase in GPU Peak Memory. In fact, the memory usage is several times la…
-
### System Info
- `transformers` version: 4.41.1
- Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.1
- Safetensors version: 0.…