gradient-accumulation Search Results

hpcaitech/Open-Sora #558

No interface to start gradient accumulation?

As mentioned as the title, there seems to be no interface for starting gradient accumulation.

Tianhao-Qi updated 3 days ago

OpenGVLab/Ask-Anything #174

Using Gradient Accumulation

Hello Thanks for your great work. I need to use gradient accumulation on batches due to RAM constraints. The training loop involves iterating over two modalities. I am concerned about the implicatio…

bexxnaz updated 1 month ago

google-research/tuning_playbook #69

Why avoid gradient accumulation?

There is this quote: ``` **Gradient accumulation** simulates a larger batch size than the -- 252 | hardware can support and therefore does not provide any throughput 253 | benefits. It should g…

RonanKMcGovern updated 3 weeks ago

huggingface/transformers #31677

Mismatch with epoch when using gradient_accumulation

### System Info - `transformers` version: 4.43.0.dev0 - Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35 - Python version: 3.10.14 - Huggingface_hub version: 0.23.4 - Safetensors version: …

SangbumChoi updated 5 days ago

pytorch/torchtitan #292

[Feature] Add gradient accumulation

Gradient accumulation (micro step) could be very useful when we want to have large batch size but with limited number of gpus.

XinDongol updated 2 weeks ago

patrick-kidger/equinox #772

Dropout causes `None`s with gradient accumulation

Dropout is really the bane of equinox it seems. Loose follow-up of #681 - effectively, I'm trying to fix this problem that cropped up a while ago when using `optax.MultiSteps` for gradient accumulatio…

neel04 updated 1 week ago

google/maxtext #607

Question: Gradient Accumulation

Hello, does it support gradient accumulation or microbatches like those in the T5X repository? I didn't find a parameter for this in base.yml, maybe I just didn't see it? Thank you!

thiagolaitz updated 1 month ago

Lightning-AI/litgpt #1474

Gradient Accumulation Step under Multi-node Pretaining

@awaelchli I found that in the `pretrain.py`, the accumulation steps are calculated based on global batch size, device number and micro batch size. This works fine under single-node setting, e.g. glo…

SHUMKASHUN updated 5 days ago

huggingface/accelerate #2866

Accelerate 0.31.0 gradient accumulation bug.

### System Info ```Shell I have updated to accelerate 0.31.0 from 0.30.0 and all my trainings with gradient_accumulation_steps > 1 started to collapse. Please double check that everything is ok. …

nikitabalabin updated 2 weeks ago

NVlabs/VILA #69

About VILADistributedSampler and gradient_accumulation_steps

If we use the VILADistributedSampler (https://github.com/Efficient-Large-Model/VILA/blob/main/llava/train/llava_trainer.py#L274-L281) for Distributed Training, should the `gradient_accumulation_steps`…

dreamerlin updated 5 days ago

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation