-
Hi, I'm trying to finetune OLMo but running into the error `ValueError: OLMoForCausalLM does not support gradient checkpointing.` Is this planned in the future?
Thanks for releasing OLMo!
-
### Feature request
Is there a tutorial for using DeepSpeed's activation checkpointing instead of PyTorch's?
I'm using `Trainer` with ZeRO integration to train my model. Here's my code:
```py…
-
Thank you for your open research and exploration!
It seems that there are some bugs in stage2 if `gradient_checkpointing `is set True.
![image](https://github.com/user-attachments/assets/57050e6c…
-
I am trying to run single GPU to multinode distributed fine tuning for Llama3-70B and Llama3 8B Models.
Below is my training configuration:
SFT (Llama3 8B & 70B)
Epochs: 3
Gradient Accumulatio…
-
I am trying to train on a 8xA100 instance. If I set `trainer_arguments.gradient_checkpointing` to `True`, the training hangs for a while and then dies with a `Segmentation fault (core dumped)` error. …
-
I was trying to use [gradient checkpointing](https://pytorch.org/docs/stable/checkpoint.html) with TorchMD model. For some reason, I get this gradient mismatch whenever the warning pops up
https:/…
-
Hi,
I'm using NVIDIA L20 (48GB) and when I execute ```sh reproduce/HLLM-Pixel.sh```, I get a ```CUDA out of memory```(like below) error. I tried reducing the train_batch_size from 8 to 2, but the …
-
### Describe the bug
**Describe the bug**
Activating --gradient_checkpointing in either Lora or DB scripts for SD3 causes: TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, no…
-
import logging
import os
import json
import torch
from datasets import load_from_disk
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel…
-
- Selectively recompute the forward pass of some operations in the backward pass to save memory.
- Replace `transformers`'s gradient checkpointing with pipegoose's gradient checkpointing.
**APIs**…