-
Hi, I'm training on the huge bread-midi-dataset.
Enumerating the dataloader from a DatasetJSON throws a KeyError in collators.py
line 164: length_of_first = batch[0].size(0)
when the batch is emp…
-
## 🐛 Bug
When I write my own (single-core, haven't tested multi-core yet!) loop for running a PyTorch model training with gradient accumulation on TPUs, I get an OOM error when running with gradien…
siddk updated
2 years ago
-
### System Info
im use python 3.10 and last version of all libraries on 04.10.2024,
and i try to run: trl sft --model_name_or_path meta-llama/Llama-3.2-3B --dataset_name Vikhrmodels/GrandMaster-PRO…
-
I got `SystemError: returned NULL without setting an error` when setting **accumulate_grad_batches = 2**. But I see nothing helpful in the log.
Error gone when changing `DDPStrategy(static_graph=F…
nousr updated
4 months ago
-
Hello Guys,
i wanted to try your version of Phi-3.5-mini-instruct with the DPO Trainer from Huggingface.
But when i run the Training i get *NaN or Inf found in input tensor.*
Same code wor…
-
Support for Gradient accumulation for lower batch size to accommodate large size images in single 16gb GPU?
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
https://huggingface.co/docs/transformers/v4.38.2/perf_train_gpu_one#gradient-accumulation
In the `TrainingArguments` passed to `SFTTrainer`, we can likely reduce the total GPU memory required to tr…
-
### Description & Motivation
There is an easy way to do gradient accumulatation on lighting, but as I understand the batch norm is problematic since it's calculated every forward pass.
We should fix…
-
**Describe the bug**
I reviewed the initialization of self.gradient_accumulation_steps in the DeepSpeedConfig module when only train_batch and micro_batch are set (deepspeed Version: 0.13.1):
```p…