gradient-checkpointing Search Results

allenai/OLMo #549

Gradient Checkpointing

Hi, I'm trying to finetune OLMo but running into the error `ValueError: OLMoForCausalLM does not support gradient checkpointing.` Is this planned in the future? Thanks for releasing OLMo!

fakerybakery updated 4 months ago

huggingface/transformers #32409

Tutorial for using DeepSpeed's activation checkpointing inst…

### Feature request Is there a tutorial for using DeepSpeed's activation checkpointing instead of PyTorch's? I'm using `Trainer` with ZeRO integration to train my model. Here's my code: ```py…

huyiwen updated 1 week ago

fudan-generative-vision/hallo #162

some bugs when gradient_checkpointing is set True

Thank you for your open research and exploration! It seems that there are some bugs in stage2 if `gradient_checkpointing `is set True. ![image](https://github.com/user-attachments/assets/57050e6c…

progrobe updated 1 month ago

pytorch/torchtune #1487

How to disable Checkpointing for Full tuning or PEFT runs?

I am trying to run single GPU to multinode distributed fine tuning for Llama3-70B and Llama3 8B Models. Below is my training configuration: SFT (Llama3 8B & 70B) Epochs: 3 Gradient Accumulatio…

premmotgi updated 3 weeks ago

UKPLab/sentence-transformers #2844

Multi-GPU Training with DP or DDP combined with reentrant gr…

I am trying to train on a 8xA100 instance. If I set `trainer_arguments.gradient_checkpointing` to `True`, the training hangs for a while and then dies with a `Segmentation fault (core dumped)` error. …

olivierr42 updated 1 month ago

torchmd/torchmd-net #310

Gradient Checkpointing Error

I was trying to use [gradient checkpointing](https://pytorch.org/docs/stable/checkpoint.html) with TorchMD model. For some reason, I get this gradient mismatch whenever the warning pops up https:/…

shenoynikhil updated 6 months ago

bytedance/HLLM #3

How to solve ```CUDA out of memory```?

Hi, I'm using NVIDIA L20 (48GB) and when I execute ```sh reproduce/HLLM-Pixel.sh```, I get a ```CUDA out of memory```(like below) error. I tried reducing the train_batch_size from 8 to 2, but the …

mathetian updated 5 days ago

huggingface/diffusers #8503

SD3 and Gradient checkpointing gives error and crashes

### Describe the bug **Describe the bug** Activating --gradient_checkpointing in either Lora or DB scripts for SD3 causes: TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, no…

bluvoll updated 3 weeks ago

unslothai/unsloth #1037

Fine tune and infer llama3 with cpu

import logging import os import json import torch from datasets import load_from_disk from transformers import TrainingArguments from trl import SFTTrainer from unsloth import FastLanguageModel…

SidneyLann updated 1 day ago

xrsrke/pipegoose #4

Gradient Checkpointing

- Selectively recompute the forward pass of some operations in the backward pass to save memory. - Replace `transformers`'s gradient checkpointing with pipegoose's gradient checkpointing. **APIs**…

xrsrke updated 10 months ago

1000+ results for gradient-checkpointing

1000+ results
for gradient-checkpointing