-
### System Info
i'am using sagemaker to run finetune on an ml.g5.48xlarge with the requirement file :
```
transformers==4.44.2
datasets==3.0.0
accelerate==0.34.2
bitsandbytes==0.44.0
hugging…
-
### System Info
```Shell
- `Accelerate` version: 0.29.2
- Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /home/oskar/projects/robust-llm/venv/bin/accelerate…
ojh31 updated
3 weeks ago
-
**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torc…
-
### 🐛 Describe the bug
When trying to add FSDP to our training code base that includes a pipelining scheme I encountered an issue if forward and backward passes are no longer interleaved but instead …
-
In the [example T5 training code](https://github.com/pytorch/examples/blob/cdef4d43fb1a2c6c4349daa5080e4e8731c34569/distributed/FSDP/T5_training.py#L77C24-L77C35), the main function creates a copy of …
-
### Bug description
The Pytorch Lightining is taking more memory than Pytorch FSDP.
I'm able to train the gemma-2b model but it takes 3 times more memory.
For openchat it goes out of memory.
…
-
per https://discord.com/channels/1104757954588196865/1111279858136383509/1116644094484164609
SyLM — 06/09/2023 4:24 AM
Yeah, I did that
I wondered if there was a reason I was not aware of, and t…
-
### System Info
torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2
### Information
- [ ] The official example scripts
- [ ] My own…
-
### 🐛 Describe the bug
When using FSDP (Fully Sharded Data Parallel) to save a model, some parameters are not fully gathered on rank 0 and therefore not properly saved. This issue occurs specifical…
-
i run the finetune.py without lora and use the fsdp ,but there is a error, fsdp/_init_utils.py,line889 inconsistent compute device and'device_id ' on rank 3: cuda:0 vs cuda:3; i run the programmer in …