-
Hello,
I'm running `bitsandbytes==0.41.1` in a Python 3.10 miniconda environment, 8xA100 GPU (using `accelerate` for multi-GPU), Cuda 12.2.
I'm having problems resuming training (DPO) from a ch…
-
## 🐛 Bug
### To Reproduce
Code:
```python
import os
import torch
import torch.distributed as tdist
import thunder
from thunder.tests.litgpt_model import GPT, Config
if __name__ == "__…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
[2024-07-12 02:22:28,334] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda…
-
### 🐛 Describe the bug
I have a custom implementation of TP which uses device mesh to lay out the tensors and then dtensors it. I then pass in the device mesh into FSDP for wrapping. Concretely, I am…
-
### Description & Motivation
https://github.com/pytorch/pytorch/pull/104810 adds the recommendation that the `save` APIs should be called in a single node (`shard_group`).
https://github.com/pyt…
-
### Bug description
See added deprecation warnings in https://github.com/pytorch/pytorch/pull/113867
### What version are you seeing the problem on?
v2.2
### How to reproduce the bug
Or…
-
Here's the command I ran:
```
python train.py \
--model_name meta-llama/Llama-2-70b-hf \
--batch_size 1 \
--context_length 1024 \
--precision bf16 \
--train_type hqq_lora \
--use_gradient_ch…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
### Description & Motivation
Both the Fabric and Trainer strategies are designed to have a single plugin enabled from the beginning to the end of the program.
This has been fine historically, ho…
-
**Feature Overview (aka. Goal Summary)**
Implement Intel Gaudi support in InstructLab project, so Gaudi 2 and Gaudi 3 can be used for SDG, evaluation, and training.
**Goals (aka. expected user out…