-
## Context
* Pytorch version: 3.10
* Operating System and version: ubuntu 22.04
## Your Environment
* Installed using source? [yes/no]: no
* Are you planning to deploy it using docker con…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
I see that [PEFT brought in](https://github.com/huggingface/peft/releases/tag/v0.10.0) QLoRA with FSDP support in their latest release.
Any plans to incorporate this into litgpt?
-
When I use an 8-bit ADAM with FSDP, I get an error as follows:
`RuntimeError: output tensor must have the same type as input tensor`
If my understanding is correct, there seems to be a casting i…
-
We don't know too much about these test failures, but they happened for @Flamefire (`distributed/fsdp/test_fsdp_core`) and @branfosj (`test_native_mha`). The latter only fails on Broadwell.
We don'…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
pass
### Reproduction
```
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch \
--config_fil…
-
### System Info
- `transformers` version: 4.44.2
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
### Bug description
In manual optimization, the user can call `self.backward()` anywhere in `training_step()`. There are no limitations for this in single-device execution, but for distributed strate…
-
### Describe the bug
This time i set amount of steps to 2 to make sure it correctly saves the model after an hour of training. But it does not.
### Reproduction
Run `accelerate config`
```
comp…
kopyl updated
2 weeks ago